Help file for darwin created by darwin on linneus78 on Tue Feb 19 10:54:00 2013
Do not edit this file, this file is built automatically by:
make darwinhelp
Warning: procedure Trim reassigned
Warning: procedure FastME reassigned
Warning: procedure NJ reassigned
Warning: procedure BioNJ reassigned
Warning: procedure BioNJAdam reassigned
Warning: procedure Weighbor reassigned
Warning: procedure Protdist reassigned
Warning: procedure TKFdist reassigned
Warning: procedure TKF91 reassigned
Warning: procedure TKF92 reassigned
Warning: procedure RAxML reassigned
Warning: procedure PhyML reassigned
Warning: procedure PhyML_2_0 reassigned
Warning: procedure WritePHYLIP reassigned
Warning: procedure FastME reassigned
Warning: procedure NJ reassigned
Warning: procedure BioNJ reassigned
Warning: procedure BioNJAdam reassigned
Warning: procedure Weighbor reassigned
Warning: procedure Protdist reassigned
Warning: procedure TKFdist reassigned
Warning: procedure TKF91 reassigned
Warning: procedure TKF92 reassigned
Warning: procedure FastME reassigned
Warning: procedure NJ reassigned
Warning: procedure BioNJ reassigned
Warning: procedure BioNJAdam reassigned
Warning: procedure Weighbor reassigned
Warning: procedure Protdist reassigned
Warning: procedure TKFdist reassigned
Warning: procedure TKF91 reassigned
Warning: procedure TKF92 reassigned
Warning: procedure IdenticalTrees reassigned
Warning: procedure GetLcaSubtree reassigned
Warning: procedure TotalTreeWeight reassigned
Warning: procedure GetTreeLength_r reassigned
Warning: procedure AddSpecies reassigned
Warning: procedure FindRules reassigned
Warning: procedure FindRules_R reassigned
Warning: procedure FindSpeciesViolations reassigned
Warning: procedure IsAmbig reassigned
Warning: procedure CheckAmbigTree reassigned
Warning: procedure CheckTree reassigned
Warning: procedure SubDist reassigned
Warning: procedure GetRootDist_r reassigned
Warning: procedure GetPathDistance reassigned
Warning: procedure GetMATreeNew reassigned
Warning: procedure NucPepMatch reassigned
Warning: procedure NucPepMatch_select reassigned
Warning: procedure NucPepMatch_Entry reassigned
Warning: procedure NucPepMatch_ID reassigned
Warning: procedure GetPosition reassigned
Warning: procedure LocalNucPepAlign reassigned
Warning: procedure NucPepMatch_print reassigned
Warning: procedure LocalNucPepAlignBestPam reassigned
Warning: procedure GlobalNucPepAlign reassigned
Warning: procedure GetPeptides reassigned
Warning: procedure GetIntrons reassigned
Warning: procedure Normalize reassigned
Warning: procedure Denormalize reassigned
Warning: procedure AlignNucPepAll reassigned
Warning: procedure VisualizeGene reassigned
Warning: procedure VisualizeProtein reassigned
AAAToInt
Function AAAToInt - convert a 3 letter amino acid code to an integer
Option: builtin
Calling Sequence: AAAToInt(aa)
Parameters:
Name Type Description
-----------------------------------------------------
aa string three-letter amino acid abbreviations
Returns:
1..20
Synopsis: This function converts a three letter abbreviation for an amino
acid to a posint between 1..20 according to the standard ordering of amino
acids. (see ?aminoacids)
Examples:
> AAAToInt('Val');
20
See Also:
?aminoacids ?BBBToInt ?CIntToInt ?IntToA ?IntToCInt
?AminoToInt ?BToInt ?CodonCode ?IntToAAA ?IntToCodon
?AToCInt ?CIntToA ?CodonToA ?IntToAmino
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToB
?AToInt ?CIntToAmino ?CodonToInt ?IntToBase
?BaseToInt ?CIntToCodon ?GeneticCode ?IntToBBB
AC
Class AC - Data structure for storing ACs (Accession numbers) of DB
Template: AC(id)
Fields:
Name Type Description
------------------------------------------------------------------------
id {list,string,structure} ID(s) of Entries in the database DB
PatEntry, Match or Entry data structure
Returns:
AC
Methods: AC_type Entry Sequence
Synopsis: AC is a data structure which holds accession numbers (ACs)
contained in the and tags in a Darwin formatted database. ACs
can be used as arguments to other functions, e.g. Entry, Sequence, to
indicate that the Entry or sequence desired is the one with the given AC.
AC will attempt to convert its arguments when they are other entry
descriptions to ACs. An AC can be given with or without the trailing ';'.
The database contains the semicolon, so if the AC does not have it, one is
added.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
> ac := AC('Q62671');
ac := AC(Q62671)
> Entry(ac);
EDD_RATQ62671;Ubiquitin-- ..(1568).. V
> Sequence(ac);
ARRERMTAREEASLRTLEGRRRATLLSARQGMMSARGDFLNYALSLMRSH ..(920).. LAIKTKNFGFV
> AC(Entry(2));
AC(Q43495;)
> AC(PatEntry(10000..10002));
AC(P25623; P25622; Q96VH0;,Q9Z851; Q9JSE4;,P56926;)
> AC(Sequence(Entry(1)));
AC(P15711;)
See Also:
?Entry ?Match ?SearchAC ?Sequence ?Species_Entry
?ID ?PatEntry ?SearchID ?SPCommonName ?SP_Species
APC
Function APC( MA:array(string), Pos:integer )
Returns an APC amino acid if all sequences in MA at Pos contain the
same amino acid. If a third argument is given then the percentage
of non indel is greater than or equal to a certain threshold.
Deletions are ignored
AToCInt
Function AToCInt - One Letter Amino Acid Name to List of Codon Integers
Calling Sequence: AToCInt(AA)
Parameters:
Name Type Description
----------------------------------------
AA string amino acid 1 letter code
Returns:
list
Synopsis: This function converts an amino acid 1 letter code into a list of
the corresponding codons. The amino acid 1 letter code for the stop codons
is '$'.
Examples:
> AToCInt('$');
[49, 51, 57]
> AToCInt(L);
[29, 30, 31, 32, 61, 63]
See Also:
?AAAToInt ?BBBToInt ?CIntToInt ?IntToA ?IntToCInt
?aminoacids ?BToInt ?CodonCode ?IntToAAA ?IntToCodon
?AminoToInt ?CIntToA ?CodonToA ?IntToAmino
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToB
?AToInt ?CIntToAmino ?CodonToInt ?IntToBase
?BaseToInt ?CIntToCodon ?GeneticCode ?IntToBBB
AToCodon
Function AToCodon - One Letter Amino Acid Name to List of Codons
Calling Sequence: AToCodon(AA)
Parameters:
Name Type Description
----------------------------------------
AA string amino acid 1 letter code
Returns:
list
Synopsis: This function converts an amino acid 1 letter code into a list of
the corresponding codons. The amino acid 1 letter code for the stop codons
is '$'.
Examples:
> AToCodon('$');
[TAA, TAG, TGA]
> AToCodon(L);
[CTA, CTC, CTG, CTT, TTA, TTG]
See Also:
?AAAToInt ?BBBToInt ?CIntToInt ?IntToA ?IntToCInt
?aminoacids ?BToInt ?CodonCode ?IntToAAA ?IntToCodon
?AminoToInt ?CIntToA ?CodonToA ?IntToAmino
?AToCInt ?CIntToAAA ?CodonToCInt ?IntToB
?AToInt ?CIntToAmino ?CodonToInt ?IntToBase
?BaseToInt ?CIntToCodon ?GeneticCode ?IntToBBB
AToInt
Function AToInt - convert a 1 letter amino acid code to an integer
Option: builtin
Calling Sequence: AToInt(aa)
Parameters:
Name Type Description
------------------------------------------------------
aa string single letter amino acid abbreviations
Returns:
0..21
Synopsis: This function converts a one letter abbreviation for an amino acid
to a posint between 1..20 according to the standard ordering of amino acids
(see ?aminoacids). If aa is not a amino acid abbreviation, the value 0 is
returned. If aa is the unknown amino acid X, then the value 21 is returned.
Examples:
> AToInt('V');
20
> AToInt(X);
21
See Also:
?AAAToInt ?BBBToInt ?CIntToInt ?IntToA ?IntToCInt
?aminoacids ?BToInt ?CodonCode ?IntToAAA ?IntToCodon
?AminoToInt ?CIntToA ?CodonToA ?IntToAmino
?AToCInt ?CIntToAAA ?CodonToCInt ?IntToB
?AToCodon ?CIntToAmino ?CodonToInt ?IntToBase
?BaseToInt ?CIntToCodon ?GeneticCode ?IntToBBB
AaFreqNoPat
Function AaFreqNoPat( DB:database )
Return the count vector of all amino acids or bases in F.
ActOut
Function ActOut( MA:array(string), ActAA )
Reports the APC positions in which the amino acid is of the type ActAA
AddDeviation
Function AddDeviation - Perturbs the length of the outer branches of a tree.
Calling Sequence: AddDeviation(t)
Parameters:
Name Type Description
-------------------------
t Tree tree
Returns:
Tree
Synopsis: The function AddDeviation perturbs the lenghts of the outer
branches of a tree by scaling it by a exponentially distributed factor, thus
removing ultrametricity.
Examples:
> BDTree := BirthDeathTree(0.1, 0.01, 10, 50);
BDTree := Tree(Tree(Tree(Tree(Leaf(S1,50),49.0331,Leaf(S2,50)),46.3559,Tree(Leaf(S3,50),48.4224,Leaf(S4,50))),41.6245,Tree(Leaf(S5,50),48.0734,Tree(Leaf(S6,50),48.4142,Leaf(S7,50)))),34.5821,Tree(Leaf(S8,50),42.2466,Tree(Leaf(S9,50),42.6260,Leaf(S10,50))))
> newTree := AddDeviation(BDTree);
newTree := Tree(Tree(Tree(Tree(Leaf(S1,7.6316),7.1416,Leaf(S2,8.4040)),6.4383,Tree(Leaf(S3,8.6195),7.3261,Leaf(S4,10.7321))),5.4527,Tree(Leaf(S5,15.4179),11.0590,Tree(Leaf(S6,14.3074),11.2550,Leaf(S7,11.8044)))),0,Tree(Leaf(S8,10.8904),4.5808,Tree(Leaf(S9,10.3070),4.8952,Leaf(S10,6.9679))))
See also: ?BirthDeathTree ?ScaleTree ?Tree
AddSpecies
Function AddSpecies( t:Tree, Species:list )
Species: List of species, this is used to distinguish between
paralogous and orthologous changes.
Every node of the tree contains the information of which species were on
the left (t[6]) and on the right (t[7]) side of the branch. If the tree length is less than 6,
then the tree is expanded with 0 at position 4 and 5.
(e.g. {MOUSE, YEAST, ECOLI}).
Align
dynamic programmingalignments
Function Align - align sequences using various modes of dynamic programming
Calling Sequence: Align(seq1,seq2,method,DayMat)
Parameters:
Name Type Description
------------------------------------------------------------------------------
seq1 {ProbSeq,string} pept, nucleot or probabilistic sequence
seq2 {ProbSeq,string} pept, nucleot or probabilistic sequence
method string the mode of dynamic programming to use
DayMat {DayMatrix,list(DayMatrix)} Dayhoff matrices used for alignment
Returns:
Alignment
Synopsis: Align does an alignment of two sequences using the similarity
scores given in the DayMat and the given method. If a single DayMatrix is
given, the alignment is done using it. If a list of DayMatrix is given, it
is understood that the best PAM matrix be used. In this case Align will
also compute the PamDistance and PamVariance between the two sequences. The
method is optional, if not given it assumes Local. The valid methods are:
Local A local alignment will be performed, this means that the best
subsequences of seq1 and seq2 will be selected to be aligned. This
type of alignment gives the highest possible similarity score of any
alignment. This is sometimes called the Smith & Watermann algorithm.
Global A global alignment will be performed, this means that the entire seq1
is aligned against the entire seq2. This may result in a negative
score if the sequences do not align very well. This is sometimes
called the Needleman & Wunsch algorithm.
CFE A Cost-Free ends alignment is done. This is like a Global alignment,
but deletions of one of the sequences at each of the end are not
penalized. In some sense it is between a Local and a Global
alignment.
Shake A forward-backward alignment is performed. This alignment iterates
forward and backwards until the score cannot be increased. In its
forward phase will start at the given positions for seq1 and seq2 and
find the ends which give a maximal score. From this end, it will
perform backwards dynamic programming to find the optimal beginning,
and so on until convergence. This type of alignment is quite similar
to a Local alignment, but can be directed to focus on a particular
alignment, even though it may not be the best of the two sequences.
If the DayMat is omitted, the global variable DM (if assigned a DayMatrix) is
used, else a PAM-250 matrix is constructed.
If in addition to the method, the keyword "NoSelf" is included, when sequences
of peptides or nucleotides are aligned (excluding ProbSeq), self-matches are
not allowed. That is, if a sequence is aligned to itself (being
structurally the same string, this we call self-alignment), the self-match
(which is trivial) will not be allowed. This is done by giving the
alignment of a position with itself a large penalty. By doing this it is
possible to find repeated patterns. I.e. an alignment with itself, where
the identity is ruled out, will show any repeated patterns. In particular
if the sequences align with an offset of k, then there is a k-long motif
which is repeated in the sequence.
The method to find the approximate PamDistance and variance may not find the
global maximum of the Score, it may find a local maximum. By using the
argument "ApproxPAM=ppp", the search for the maximum will be started at PAM
distance ppp. This may help when we know an approximation of the distance,
or may provide a way of exploring the existence of other local maxima.
Examples:
> Align(AC(P00083),AC(P00091));
Alignment(Sequence(AC('P00083'))[14..92],Sequence(AC('P00091'))[19..97],177.7799,DM,0,0,{Local})
> Align(Entry(1),Entry(2),Local,DMS);
Alignment(Sequence(AC('P15711'))[905..917],Sequence(AC('Q43495'))[13..25],45.1050,DMS[346],80,1153.8025,{Local})
> Align(AC(P13475),AC(P13475),Local,DMS,NoSelf);
Alignment(Sequence(AC('P13475'))[128..178],Sequence(AC('P13475'))[137..188],279.9088,DMS[308],42.1286,98.4150,{Local,NoSelf})
See Also:
?Alignment ?CodonAlign ?DynProgStrings ?MAlign
?CalculateScore ?DynProgScore ?EstimatePam
AlignNucPepAll
Function AlignNucPepAll
Calling Sequence: AlignNucPepAll(nuc,dm,division,goal,pEntries)
Parameters:
Name Type
---------------------------
nuc NucleotideString
dm DayMatrix
division string
goal numeric
pEntries posint..posint
Returns:
list(NucPepMatch)
Global Variables: DB OneAllMatch_SimilOnly
Synopsis: Match nuc against a complete PepDB or the entries in the range
given by pEntries of PepDB and return all matches reaching goal using dm and
intron scoring according to division.
Examples:
See Also:
?Denormalize ?GlobalNucPepAlign ?NucPepMatch
?GetIntrons ?LocalNucPepAlign ?VisualizeGene
?GetPeptides ?LocalNucPepAlignBestPam ?VisualizeProtein
?GetPosition ?Normalize
AlignNucPepMatch
Function AlignNucPepMatch
Option: builtin
Calling Sequence: AlignNucPepMatch(npm,dm)
Parameters:
Name Type
------------------
npm NucPepMatch
dm DayMatrix
Returns:
NucPepMatch
Synopsis: Returns a new match with additional entries: NucGaps, PepGaps and
Introns defining its alignment.
Examples:
See also: ?NucPepMatch
AlignOneAll
Function AlignOneAll
Option: builtin
Calling Sequence: AlignOneAll(seq,db,day,cutoff,entries)
Parameters:
Name Type Description
-----------------------------------------------------------------------------
seq {posint,string} a sequence or an entry number
db database a DNA or protein database
day DayMatrix scoring matrix
cutoff numeric only matches with score > cutoff will be reported
entries posint..posint (optional) compare these entries in db only
Returns:
list(Match)
Synopsis: Align seq against all members of the database db (or the subset of
entries specified by the entries parameter when present) and return the list
of matches which have a similarity score, using day, which exceeds cutoff.
This function will return only one alignment per database sequence. If seq
is a positive integer, then it is understood to be the sequence in that
entry number. The alignments reported are Local alignments, that is the
best subsequences are matched.
This type of search is similar to what FASTA and BLAST (Basic Local Alignment
Search Tool) do. The main difference between them and AlignOneAll is that
AlignOneAll does not use approximations, it does rigorous dynamic
programming against all the sequences in the database. Its speed is
comparable to the other programs, so we see no reason to use shortcuts when
the exact results are easy to obtain.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
> AlignOneAll('NKRSPAASQPPVSRVNPQEESYQKLAMETLEELDWCLD',DB,DM,110);
[Match(168.3,9748355,71916164,38,38,250),
Match(147.5,9749450,71916164,38,38,250),
Match(122.2,9752627,71916164,38,38,250),
Match(122.2,9754188,71916164,38,38,250)]
See also: ?DB ?SearchFrag ?SearchSeqDb
AlignedSeq
Function AlignedSeq( MA:array(string) )
returns for all positions in a multiple alignment the number of
alignable sequences
Alignment
Class Alignment - a protein or DNA pairwise sequence alignment
Template: Alignment(Seq1,Seq2,Score,DayMatrix,PamDistance,PamVariance,modes)
Fields:
Name Type Description
---------------------------------------------------------------
Seq1 string the first protein or DNA sequence
Seq2 string the second protein or DNA sequence
Score numeric score of the alignment
DayMatrix DayMatrix Dayhoff matrix used
PamDistance numeric estimate of the PAM distance or 0
PamVariance numeric variance of the PAM distance or 0
modes set(string) optional modes of alignment
Identity numeric fraction identical positions (0..1)
Length1 posint length of Seq1
Length2 posint length of Seq2
Offset1 integer database offset of Seq1
Offset2 integer database offset of Seq2
PamNumber numeric synonym of PamDistance
Sim numeric synonym of Score
Methods: Alignment_type AlSumm HTMLC LaTeXC lprint Match print
Rand select Sequence string
Synopsis: An Alignment stores the information of a pairwise alignment between
two sequences (protein or DNA). It replaces the Match structure, which is
now obsolete. If the mode for the alignment is just Local or unknown, it is
omitted, otherwise it is a set with one of {Local,Global,CFE,Shake} and
optionally NoSelf.
See Also:
?Align ?DynProgScore ?EstimatePam
?CalculateScore ?DynProgStrings ?MAlignment
AllIndices
Function AllIndices( ma:array(string), t:Tree )
compute and print the Kabat-Wu, Probabilistic and Scale indices
AllRootedTrees
Function AllRootedTrees - Returns all root variants from a tree
Calling Sequence: AllRootedTrees(tree)
Parameters:
Name Type Description
-------------------------------------------------------------
tree Tree the tree structure with arbitrary root position
Returns:
set(Tree)
Synopsis: Returns all root variants from a tree, including the input tree
itself.
Examples:
> t := Tree(Tree(Leaf(A,15),5,Leaf(B,15)),0,Tree(Leaf(C,15),11,Leaf(D,15)));
t := Tree(Tree(Leaf(A,15),5,Leaf(B,15)),0,Tree(Leaf(C,15),11,Leaf(D,15)))
> sAllRootVariants := AllRootedTrees(t);
sAllRootVariants := {Tree(Leaf(A,5),0,Tree(Leaf(B,15),5,Tree(Leaf(C,25),21,Leaf(D,25)),100)),Tree(Leaf(C,2),0,Tree(Tree(Leaf(A,28),18,Leaf(B,28)),2,Leaf(D,6),100)),Tree(Tree(Leaf(A,15),5,Leaf(B,15)),0,Tree(Leaf(C,15),11,Leaf(D,15))),Tree(Tree(Leaf(A,15),5,Tree(Leaf(C,25),21,Leaf(D,25)),100),0,Leaf(B,5)),Tree(Tree(Tree(Leaf(A,28),18,Leaf(B,28)),2,Leaf(C,6),100),0,Leaf(D,2))}
See also: ?AllTernaryRoots ?RotateTree ?Tree
AllTernaryRoots
Function AllTernaryRoots - returns a set of all trees with ternary roots
Calling Sequence: AllTernaryRoots(tree)
Parameters:
Name Type Description
-------------------------------------------------------------
tree Tree the tree structure with arbitrary root position
Returns:
set(Tree)
Synopsis: Returns all posssible trees with ternary roots. For each internal
node of the tree (except the original root), a tree is returned where the
root is at distance 0 above the internal node. For all practical purposes
(e.g. reconstruction of ancestral sequences), this has the same effect as
having a ternary root (which is not possible with the Tree data structure).
Examples:
> t := Tree(Tree(Leaf(A,15),5,Leaf(B,15)),0,Tree(Leaf(C,15),11,Leaf(D,15)));
t := Tree(Tree(Leaf(A,15),5,Leaf(B,15)),0,Tree(Leaf(C,15),11,Leaf(D,15)))
> AllTernaryRoots(t);
{Tree(Tree(Leaf(A,10),0,Leaf(B,10)),0,Tree(Leaf(C,20),16,Leaf(D,20))),Tree(Tree(Leaf(A,26),16,Leaf(B,26)),0,Tree(Leaf(C,4),0,Leaf(D,4)))}
See also: ?AllRootedTrees ?PASfromTree ?RotateTree ?Tree
AltGenCode
Function AltGenCode - Use Alternative Translation Tables
Calling Sequence: AltGenCode(transl_table,codon)
Parameters:
Name Type Description
------------------------------------------------------
transl_table integer alternative translation table
codon string 3 DNA bases
Returns:
list
Global Variables: AltGenCode_array
Synopsis: AltGenCode takes a 3 letter codon as an input and returns a list
of the amino acid(s) for which the triplet codes. A codon has more than one
translation when, in addition to its normal translation, it is used as an
alternative start codon (M). Absent codons are not designated as such. They
will return the translation of the standard genetic code. The translation
tables are the same as those of the reference website. Additional
initiation codons may be possible. See the website for more information and
a list of the organisms that use each code.
table number description
---------------------------------------------------------------------
1 The Standard Code
2 The Vertebrate Mitochondrial Code
3 The Yeast Mitochondrial Code
4 The Mold, Protozoan, and Coelenterate Mitochondrial
Code and the Mycoplasma/Spiroplasma Code
5 The Invertebrate Mitochondrial Code
6 The Ciliate, Dasycladacean and Hexamita Nuclear Code
7 deleted
8 deleted
9 The Echinoderm Mitochondrial Code
10 The Euplotid Nuclear Code
11 The Bacterial and Plant Plastid Code
12 The Alternative Yeast Nuclear Code
13 The Ascidian Mitochondrial Code
14 The Flatworm Mitochondrial Code
15 Blepharisma Nuclear Code
16 Chlorophycean Mitochondrial Code
17 not available
18 not available
19 not available
20 not available
21 Trematode Mitochondrial Code
22 Scenedesmus obliquus mitochondrial Code
23 Thraustochytrium Mitochondrial Code
References: www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c#SG4
Examples:
> AltGenCode(11,TTG);
[L, M]
> AltGenCode(11,TTT);
[F]
> AltGenCode(12,CTG);
[S, M]
See Also:
?AminoToInt ?BBBToInt ?CIntToCodon ?Complement ?IntToCInt
?antiparallel ?BToInt ?CIntToInt ?GeneticCode ?IntToCodon
?AToCInt ?CIntToA ?CodonToA ?IntToB ?Reverse
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase
?BaseToInt ?CIntToAmino ?CodonToInt ?IntToBBB
AminoToInt
Function AminoToInt - convert an amino-acid name to an integer
Option: builtin
Calling Sequence: AminoToInt(aa)
Parameters:
Name Type Description
------------------------------------------
aa string full names for amino acids
Returns:
1..20
Synopsis: This function converts the full name for an amino acid to a posint
between 1..20 according to the standard ordering of amino acids.
Examples:
> AminoToInt('Serine');
16
See Also:
?AAAToInt ?BBBToInt ?CIntToInt ?IntToA ?IntToCInt
?aminoacids ?BToInt ?CodonCode ?IntToAAA ?IntToCodon
?AToCInt ?CIntToA ?CodonToA ?IntToAmino
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToB
?AToInt ?CIntToAmino ?CodonToInt ?IntToBase
?BaseToInt ?CIntToCodon ?GeneticCode ?IntToBBB
ApproxSearchString
Function ApproxSearchString
Option: builtin
Calling Sequence: ApproxSearchString(pat,txt,tol)
Parameters:
Name Type
------------------
pat string
txt string
tol {0, posint}
Returns:
{-1,posint}
Synopsis: The tolerance tol specifies how many mismatches are allowed between
the pattern pat and the body of text txt. If pat is found in txt (within
tol mismatches), the offset in txt is returned. Otherwise, -1 is returned.
Note, spaces count as mismatches and case differences do not count as
mismatches.
Examples:
> txt := 'AAAAAAAAAHeLLoBBBBB';
txt := AAAAAAAAAHeLLoBBBBB
> j := ApproxSearchString('hallo', txt, 1);
j := 9
> j+txt;
HeLLoBBBBB
> ApproxSearchString('nothing', 'N.O.T.H.I.N.G.', 4);
-1
See Also:
?BestSearchString ?MatchRegex ?SearchMultipleString
?CaseSearchString ?SearchApproxString ?SearchString
?HammingSearchString ?SearchDelim
AsciiToInt
Function AsciiToInt - convert a single character to its ascii ordinal number
Option: builtin
Calling Sequence: AsciiToInt(s)
Parameters:
Name Type Description
------------------------------------
s string a string of length 1
Returns:
posint
Synopsis: Converts a single character into its ascii ordinal number. This is
useful when encoding/decoding symbols for dynamic programming. It is also
useful in general for the analysis of raw input.
Examples:
> AsciiToInt('a');
97
> AsciiToInt(' ');
32
See Also:
?AToInt ?IntToA ?SearchDelim
?BestSearchString ?IntToAscii ?SearchMultipleString
?CaseSearchString ?MatchRegex ?SearchString
?HammingSearchString ?SearchApproxString
BBBToInt
Function BBBToInt - Nucleic Acid Three Letter Code To Integer
Option: builtin
Calling Sequence: BBBToInt(nuc)
Parameters:
Name Type Description
--------------------------------------------------
nuc string three letter code for nucleic acid
Returns:
1..5
Synopsis: This function converts the following three letter codes for nucleic
acids Ade, Cyt, Gua, Thy, Ura to the integers 1..5 respectively.
Examples:
> BBBToInt('Ade');
1
See Also:
?AAAToInt ?BaseToInt ?CIntToInt ?IntToA ?IntToCInt
?aminoacids ?BToInt ?CodonCode ?IntToAAA ?IntToCodon
?AminoToInt ?CIntToA ?CodonToA ?IntToAmino
?AToCInt ?CIntToAAA ?CodonToCInt ?IntToB
?AToCodon ?CIntToAmino ?CodonToInt ?IntToBase
?AToInt ?CIntToCodon ?GeneticCode ?IntToBBB
BFGSMinimize
Function BFGSMinimize
Calling Sequence: BFGSMinimize(f,iniguess,epsini,epsfinal)
Parameters:
Name Type
-------------------------
f procedure
iniguess array(numeric)
epsini numeric
epsfinal numeric
Returns:
x, f(x)
Synopsis: The Quasi-Newton approach of the BFGS method is used to find the
(local) minimum of a function f. BFGSMinimize starts at iniguess and stops
if either the distance between the two last points is smaller than epsfinal
or after 1000 iterations without convergence.
See Also:
?DisconMinimize ?MaxLikelihoodSize ?MinimizeBrent ?MinimizeSD
?MaximizeFunc ?Minimize2DFunc ?MinimizeFunc ?NBody
BToInt
Function BToInt - Nucleic Acid One Letter Code To Integer
Option: builtin
Calling Sequence: BToInt(nuc)
Parameters:
Name Type Description
------------------------------------------------
nuc string one letter code for nucleic acid
Returns:
0..6
Synopsis: This function converts the following one letter codes for nucleic
acids A, C, G, T, U, X to the integers 1..6 respectively. If nuc is not one
of these symbols, then 0 is returned.
Examples:
> BToInt('A');
1
> BToInt('R');
0
See Also:
?AAAToInt ?BaseToInt ?CIntToInt ?IntToA ?IntToCInt
?aminoacids ?BBBToInt ?CodonCode ?IntToAAA ?IntToCodon
?AminoToInt ?CIntToA ?CodonToA ?IntToAmino
?AToCInt ?CIntToAAA ?CodonToCInt ?IntToB
?AToCodon ?CIntToAmino ?CodonToInt ?IntToBase
?AToInt ?CIntToCodon ?GeneticCode ?IntToBBB
BackTranscribe
Function BackTranscribe - RNA to DNA
Calling Sequence: BackTranscribe(rna)
Parameters:
Name Type Description
-------------------------------
rna string string of bases
Returns:
string
Synopsis: Replaces all U with T in the string.
Examples:
> BackTranscribe('AUG');
ATG
See also: ?Transcribe
BackTranslate
Function BackTranslate - Protein to DNA
Calling Sequence: BackTranslate(prot,method,k,db)
Parameters:
Name Type Description
-----------------------------------------------------------
prot string protein sequence
method {string,set(string)} the mode of codon selection
k integer window size
db database (opt) database to be used
Returns:
string
Synopsis: Back Translate a protein into DNA. The following methods can be
used: Random - Select codons randomly Freq - Select the most frequent
codons Least - Select the least frequent codons/motifs Reuse - Choose
codons favoring tRNA reuse DynProg - Select codons based on favored motifs
in in coding DNA (default) Combination of methods can be used as a set. Some
methods require a database to be loaded. For methods based on codon
frequency, DB must contain the DNA tag and for the DynProg the SEQ tag of DB
must be DNA.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
> BackTranslate('MAAAT');
> BackTranslate('MAAAT','DynProg',7);
See also: ?Translate
BaseCount
Function BaseCount - Counts the number of DNA bases in a sequence
Calling Sequence: BaseCount(sequ)
Parameters:
Name Type Description
----------------------------
sequ string DNA sequence
Returns:
list
Synopsis: BaseCount counts the number of each base in a DNA sequence and
returns a vector of length 6 with the number of each kind of base A, C, G,
T, U, and X in place numbers 1 through 6 respectively.
Examples:
> BaseCount('ACCGGGTTTUUX');
[1, 2, 3, 3, 2, 1]
BaseToInt
Function BaseToInt - Nucleic Acid Name To Integer
Option: builtin
Calling Sequence: BaseToInt(nuc)
Parameters:
Name Type
-----------------------------------
nuc full name for a nucleic acid
Returns:
1..5
Synopsis: This function converts the following full names for nucleic acids
Adenine, Cytosine, Guanine, Thymine, Uracil to the integers 1..5
respectively.
Examples:
> BaseToInt('Adenine');
1
See Also:
?AAAToInt ?BBBToInt ?CIntToInt ?IntToA ?IntToCInt
?aminoacids ?BToInt ?CodonCode ?IntToAAA ?IntToCodon
?AminoToInt ?CIntToA ?CodonToA ?IntToAmino
?AToCInt ?CIntToAAA ?CodonToCInt ?IntToB
?AToCodon ?CIntToAmino ?CodonToInt ?IntToBase
?AToInt ?CIntToCodon ?GeneticCode ?IntToBBB
BestSearchString
Function BestSearchString
Calling Sequence: BestSearchString(pat,text)
Parameters:
Name Type
-------------
pat string
txt string
Returns:
{0,posint}
Global Variables: NumberErrors
Synopsis: The BestSearchString function returns the best match of pat in the
body of text txt. If no match is found, it takes the first match (index 0).
Examples:
> BestSearchString('CYIQNCPRG', 'PPATBCYTQNCPLGFPTTSPS');
5
> BestSearchString('CYIQNCPRG', 'XXXXXXXXXXXXXXXXX');
0
See Also:
?CaseSearchString ?SearchApproxString ?SearchString
?HammingSearchString ?SearchDelim
?MatchRegex ?SearchMultipleString
Beta_Rand
Function Beta_Rand - Generate random Beta distributed reals
Calling Sequence: Rand(Beta(a,b))
Parameters:
Name Type
------------------
a nonnegative
b nonnegative
Returns:
nonnegative
Synopsis: This function returns a random Beta distributed number with average
a/(a+b) and variance a*b/((a+b)^2*(a+b+1)). When a and be are integers, the
Beta distribution corresponds to the distribution of the a-th ordered random
number (U(0,1)) out of a+b-1 numbers. Also, if X1 and X2 are Chi-square
distributed numbers with parameters nu1 and nu2, X1/(X1+X2) is Beta(nu1,nu2)
distributed. Beta_Rand uses Rand() which can be seeded by either the
function SetRand or SetRandSeed.
References: Handbook of Mathematical functions, Abramowitz and Stegun, 26.5
Examples:
> Rand(Beta(3,4));
0.5647
> Rand(Beta(2,100));
0.02392550
See Also:
?Binomial_Rand ?FDist_Rand ?Normal_Rand ?StatTest
?ChiSquare_Rand ?GammaDist_Rand ?Poisson_Rand ?Std_Score
?CreateRandSeq ?Geometric_Rand ?SetRand ?Student_Rand
?Cumulative ?Graph_Rand ?SetRandSeed ?Zscore
?Exponential_Rand ?Multinomial_Rand ?Shuffle
BinTree
Function BinTree( g:Graph )
Converts a cycle free connected graph to a graph equivalent to a binary
tree by introducing new nodes and edges.
Binomial_Rand
Function Binomial_Rand - Generate random binomially distributed integers
Calling Sequence: Rand(Binomial(n,p))
Binomial_Rand(n,p)
Returns:
integer
Synopsis: This function returns a random integer binomially distributed with
average n*p and variance n*p*(1-p). An example of a binomial distribution
is the number of heads resulting from tossing n times a biased coin (that
will give "heads" with probability p). In mathematical terms, the
probability that the outcome is i is binomial(n,i) * p^i * (1-p)^(n-i) (for
0 <= i <= n). Binomial_Rand uses Rand() which can be seeded by either the
function SetRand or SetRandSeed.
References: Handbook of Mathematical functions, Abramowitz and Stegun,
26.1.20
Examples:
> Rand(Binomial(20,0.3));
7
> Rand(Binomial(1000,0.01));
13
See Also:
?Beta_Rand ?FDist_Rand ?Normal_Rand ?StatTest
?ChiSquare_Rand ?GammaDist_Rand ?Poisson_Rand ?Std_Score
?CreateRandSeq ?Geometric_Rand ?SetRand ?Student_Rand
?Cumulative ?Graph_Rand ?SetRandSeed ?Zscore
?Exponential_Rand ?Multinomial_Rand ?Shuffle
BipartiteGraph
Function BipartiteGraph - generate a random bipartite graph
Calling Sequence: BipartiteGraph(n1,n2,e)
Parameters:
Name Type Description
----------------------------------------------------------------
n1 integer optional number of nodes/vertices in first set
n2 integer optional number of nodes/vertices in second set
e integer optional number of edges
Returns:
Graph
Synopsis: Generate a random bipartite graph with n1 nodes in one set and n2
nodes in another set and e edges connecting between the two. If e is not
specified, it is chosen at random. If n1 and n2 are not specified, they are
chosen at random between 5 and 20. A complete bipartite graph can be
generated by requesting it to have n1*n2 edges. The edges are otherwise
randomly chosen and have label 0.
Examples:
> BipartiteGraph(3,4,5);
Graph(Edges(Edge(0,1,7),Edge(0,2,5),Edge(0,3,4),Edge(0,3,6),Edge(0,3,7)),Nodes(1,2,3,4,5,6,7))
See Also:
?Clique ?Graph_Rand ?ParseDimacsGraph
?DrawGraph ?Graph_XGMML ?Path
?Edge ?InduceGraph ?RegularGraph
?EdgeComplement ?MaxCut ?ShortestPath
?Edges ?MaxEdgeWeightClique ?TetrahedronGraph
?FindConnectedComponents ?MinCut ?VertexCover
?Graph ?MST
?Graph_minus ?Nodes
BipartiteSquared
Function BipartiteSquared - Computes the distance between two trees
Calling Sequence: BipartiteSquared(tree1,tree2,conf,mode)
Parameters:
Name Type Description
--------------------------------------------------------------------
tree1 Tree first tree
tree2 Tree second tree
conf posint (optional, def=2) size of basic configuration
mode string (optional, def=RF) mode of counting: RF or SizeDiff
Returns:
posint
Synopsis: BipartiteSquared generalizes the Robinson and Foulds (RF) distance
between two trees. The first generalization is with respect to the basic
configurations matched. The RF inspects each internal edge, which separates
the leaves in two (conf=2) sets. It can also inspect internal nodes, which
separate the leaves in 3 groups (conf=3) or in quartets, which separate the
leaves in 4 groups (conf=4). The second generalization is in the way that
the differences are counted. The RF measure is like a Hamming distance, if
the sets of leaves are different it counts 1 if they are the same it counts
0 (mode=RF). A second alternative is to count the size of the set
differences, that is for each pair, count the number of leaves in one but
not in the other (mode=SizeDiff).
If the global variable MinLen is assigned a numerical value, any edge whose
length is <= MinLen will be considered non-existent, that is it will not
generate a difference. This is useful when comparing against trees which
are not binary, but multifurcating, like trees derived from taxonomic
information. The name BipartiteSquared comes from the algorithm to compute
the distance which solves two nested weighted bipartite matching problems:
the inner one for finding the minimum cost of a configuration against
another and the outer one for matching the best configurations of each tree.
Examples:
> t1 := Tree(Tree(Leaf(a,2),1,Leaf(b,2)),0,Tree(Leaf(c,2),1,Leaf(d,2))):
> t2 := Tree(Tree(Leaf(a,2),1,Leaf(d,2)),0,Tree(Leaf(c,2),1,Leaf(b,2))):
> BipartiteSquared(t1,t2,2,RF);
1
> BipartiteSquared(t1,t2,2,SizeDiff);
4
See Also:
?BootstrapTree ?LeastSquaresTree ?RobinsonFoulds
?ComputeDimensionlessFit ?PhylogeneticTree ?SignedSynteny
?GapTree ?RBFS_Tree ?Synteny
?IntraDistance ?ReconcileTree
BirthDeathTree
Function BirthDeathTree - Generates a tree from a birth-death process
Calling Sequence: BirthDeathTree(lambda,mu,N,h)
Parameters:
Name Type Description
---------------------------------------------------
lambda nonnegative birth rate
mu nonnegative death rate
N posint number of leaves
h positive distance from root to leaves
Returns:
Tree
Synopsis: The function BirthDeathTree generates a tree with N leaves. The
time points of the bifurcations are sampled from a birth-death process with
birth rate lambda and death rate mu over a time span h.
Note: - The resulting tree is ultrametric.
- For mu > 0 the root will usually not be at time 0.
References: Gernhard T. The conditioned reconstructed process. J Theor Biol,
2008, 253(4):769-768
Examples:
> BDTree := BirthDeathTree(0.1, 0.01, 10, 100);
BDTree := Tree(Tree(Tree(Tree(Tree(Leaf(S1,100),96.8853,Leaf(S2,100)),94.1321,Tree(Leaf(S3,100),99.2827,Tree(Leaf(S4,100),99.5675,Leaf(S5,100)))),88.6199,Leaf(S6,100)),81.4629,Tree(Tree(Leaf(S7,100),97.0925,Leaf(S8,100)),96.3985,Leaf(S9,100))),68.6093,Leaf(S10,100))
See also: ?AddDeviation ?ScaleTree ?Tree
Block
Data structure Block( GapList, Left, Right, Sum, NrGaps, NrAA, Score, Pos, Type )
Function: creates a Block data structure
Selectors:
GapList 1 -
Left 2 -
Right 3 -
Sum 4 -
NrGaps 5 -
NrAA 6 -
Score 7 -
Pos 8 -
Type 9 -
gaps, left, right, sum, score, bestpos
BootstrapTree
Function BootstrapTree - assign confidence values to internal nodes or
branches
Calling Sequence: BootstrapTree(Ds,labels,bstype)
BootstrapTree(Ds,labels,nrounds,bstype)
BootstrapTree(treeofall,bstrees,bstype)
Parameters:
Name Type Description
----------------------------------------------------------
Ds array(matrix) Distance matrices
labels array(anything) Labels
nrounds posint (optional) number of rounds
treeofall Tree tree of all data
bstrees array(Tree) trees from bootstrapping
bstype {Branches,Nodes} (optional) type
Returns:
Tree
Synopsis: Depending on the value of 'bstype', this function computes
confidence values for internal nodes (default) or branches. The values are
integers between 0 and 100, denoting how often (in percent) a particular
node or branch occured during the bootstrapping. By default, 100
bootstrapping trees from randomly selected distance matrices (prob 2/3) are
constructed and evaluated. Typically, each of the input matrices
corresponds to one orthologous group. Alternatively, a tree from all data
plus a list of trees from bootstrapping experiments could be given as
arguments. The confidence values are stored in the fourth field of the Tree
data structure and can be displayed using the option InternalNodes =
ShowBootstrap or BranchDrawing = ShowBootstrap for the DrawTree function. To
make the result more readable, only bootstrap values below 100 percent are
displayed.
Examples:
> T1 := Tree(Tree(Tree(Leaf('A',-3),-2,Leaf('B',-3) ),-1,Leaf('C',-3)),0,
Tree(Tree(Leaf('E',-3),-2,Leaf('F',-3) ),-1,Leaf('D',-3))):
> T2 := Tree(Tree(Tree(Leaf('B',-3),-2,Leaf('C',-3) ),-1,Leaf('A',-3)),0,
Tree(Tree(Leaf('D',-3),-2,Leaf('E',-3) ),-1,Leaf('F',-3))):
> BS1 := BootstrapTree(T1, [T1,T2]);
BS1 := Tree(Tree(Tree(Leaf(A,-3),-2,Leaf(B,-3),50),-1,Leaf(C,-3),50),0,Tree(Tree(Leaf(E,-3),-2,Leaf(F,-3),50),-1,Leaf(D,-3),50))
> DrawTree(BS1, InternalNodes=ShowBootstrap);
> BS2 := BootstrapTree(T1, [T1,T2], Branches);
BS2 := Tree(Tree(Tree(Leaf(A,-3),-2,Leaf(B,-3),50),-1,Leaf(C,-3),100),0,Tree(Tree(Leaf(E,-3),-2,Leaf(F,-3),50),-1,Leaf(D,-3),100))
> DrawTree(BS2, BranchDrawing=ShowBootstrap);
See Also:
?ComputeDimensionlessFit ?LeastSquaresTree ?RBFS_Tree
?DrawTree ?PhylogeneticTree ?Tree
BrightenColor
Function BrightenColor - Brighten or darken a RGB color
Calling Sequence: BrightenColor(color)
BrightenColor(color,beta)
Parameters:
Name Type Description
--------------------------------------------------------------------------------
color list(nonnegative) a RGB color
beta numeric (optional) amount of increase/decrease in brightness
Returns:
nonnegative : nonnegative
Synopsis: BrightenColor increases or decreases the color intensity. If 0 <
beta < 1, the color gets brighter and if -1 < beta < 0 the color gets
darker. The operation is not necessarily reversable (see example). The
default value for beta is 0.3
Examples:
> BrightenColor([1,0,0]);
[1, 0.3000, 0.3000]
> BrightenColor([0.5,0.5,0.5], -.2);
[0.3000, 0.2400, 0.2400]
> BrightenColor(BrightenColor([0.3,0.5,0.9],-0.4),0.4);
[0.3600, 0.5400, 0.9000]
See Also:
?ColorPalette ?DrawPointDistribution ?Set
?DrawDistribution ?DrawStackedBar ?SmoothData
?DrawDotplot ?DrawTree ?StartOverlayPlot
?DrawGraph ?GetColorMap ?StopOverlayPlot
?DrawHistogram ?Plot2Gif ?ViewPlot
?DrawPlot ?PlotArguments
CIntToA
Function CIntToA - Integer Codon Representation to Amino Acid Letter
Calling Sequence: CIntToA(codon)
Parameters:
Name Type Description
--------------------------------------
codon integer integer from 1 to 64
Returns:
string
Synopsis: This function converts the integer code for the Codons from 1 to 64
(see ?CodonCode) to the corresponding amino acid integer one letter code.
The stop codon returns $.
Examples:
> CIntToA(37);
A
> CIntToA(1);
K
See Also:
?AAAToInt ?BaseToInt ?CIntToInt ?IntToA ?IntToCInt
?aminoacids ?BBBToInt ?CodonCode ?IntToAAA ?IntToCodon
?AminoToInt ?BToInt ?CodonToA ?IntToAmino
?AToCInt ?CIntToAAA ?CodonToCInt ?IntToB
?AToCodon ?CIntToAmino ?CodonToInt ?IntToBase
?AToInt ?CIntToCodon ?GeneticCode ?IntToBBB
CIntToAAA
Function CIntToAAA - Integer Codon Representation to Amino Acid 3-Letter Code
Calling Sequence: CIntToAAA(codon)
Parameters:
Name Type Description
--------------------------------------
codon integer integer from 1 to 64
Returns:
string
Synopsis: This function converts the integer code for the Codons from 1 to 64
(see ?CodonCode) to the corresponding amino acid three letter code. The
stop codon returns the string 'Stop'.
Examples:
> CIntToAAA(37);
Ala
> CIntToAAA(1);
Lys
See Also:
?AAAToInt ?BaseToInt ?CIntToInt ?IntToA ?IntToCInt
?aminoacids ?BBBToInt ?CodonCode ?IntToAAA ?IntToCodon
?AminoToInt ?BToInt ?CodonToA ?IntToAmino
?AToCInt ?CIntToA ?CodonToCInt ?IntToB
?AToCodon ?CIntToAmino ?CodonToInt ?IntToBase
?AToInt ?CIntToCodon ?GeneticCode ?IntToBBB
CIntToAmino
Function CIntToAmino - Integer Codon Representation to Amino Acid Name
Calling Sequence: CIntToAmino(codon)
Parameters:
Name Type Description
---------------------------------------------------------
codon integer integer code for codon between 1 and 64
Returns:
string
Synopsis: This function converts the integer code for the Codons from 1 to 64
(see ?CodonCode) to the corresponding amino acid Name. The stop codon
returns the string 'Stop'.
Examples:
> CIntToAmino(12);
Serine
> CIntToAmino(49);
Stop
See Also:
?AAAToInt ?BaseToInt ?CIntToInt ?IntToA ?IntToCInt
?aminoacids ?BBBToInt ?CodonCode ?IntToAAA ?IntToCodon
?AminoToInt ?BToInt ?CodonToA ?IntToAmino
?AToCInt ?CIntToA ?CodonToCInt ?IntToB
?AToCodon ?CIntToAAA ?CodonToInt ?IntToBase
?AToInt ?CIntToCodon ?GeneticCode ?IntToBBB
CIntToCodon
Function CIntToCodon - convert an integer into 3-letter codon
Calling Sequence: CIntToCodon(x)
Parameters:
Name Type Description
----------------------------------------
x integer an integer from 1 to 64
Returns:
three nucleic bases (one letter each
Synopsis: The 64 different codons over the alphabet {A, C, G, T=U} are
ordered from 1..64. This function converts a number between 1..64 to a
codon.
Examples:
> CIntToCodon(15);
ATG
See Also:
?AAAToInt ?BaseToInt ?CIntToInt ?IntToA ?IntToCInt
?aminoacids ?BBBToInt ?CodonCode ?IntToAAA ?IntToCodon
?AminoToInt ?BToInt ?CodonToA ?IntToAmino
?AToCInt ?CIntToA ?CodonToCInt ?IntToB
?AToCodon ?CIntToAAA ?CodonToInt ?IntToBase
?AToInt ?CIntToAmino ?GeneticCode ?IntToBBB
CIntToInt
Function CIntToInt - Integer Codon Representation to Amino Acid Number
Calling Sequence: CIntToInt(codon)
Parameters:
Name Type Description
--------------------------------------
codon integer integer from 1 to 64
Returns:
1..22
Synopsis: This function converts the integer code for the Codons from 1 to 64
(see ?CodonCode) to the corresponding amino acid integers (1..20). The stop
codon returns 22.
Examples:
> CIntToInt(37);
1
> CIntToInt(1);
12
See Also:
?AAAToInt ?BaseToInt ?CIntToCodon ?IntToA ?IntToCInt
?aminoacids ?BBBToInt ?CodonCode ?IntToAAA ?IntToCodon
?AminoToInt ?BToInt ?CodonToA ?IntToAmino
?AToCInt ?CIntToA ?CodonToCInt ?IntToB
?AToCodon ?CIntToAAA ?CodonToInt ?IntToBase
?AToInt ?CIntToAmino ?GeneticCode ?IntToBBB
CalculateScore
Function CalculateScore - Score two sequences as is, without aligning them
Calling Sequence: CalculateScore(seq1,seq2,DM)
Parameters:
Name Type Description
---------------------------------------------------------
seq1 string first sequence
seq2 string second sequence
DM DayMatrix Dahyhoff matrix to score the sequences
Returns:
numeric
Synopsis: Calculate the score between two sequences, as is, no alignment
done. Sequences may contain '_' indicating an indel. An '_' matched
against another '_' will not be scored any value, like what happens as a
result of a multiple alignment.
Examples:
> CalculateScore(CITKLWDGDQVLY,CLTKIFDGDQVIV,DM);
50.2066
See also: ?Align ?EstimatePam
CallSystem
Function CallSystem
Option: builtin
Calling Sequence: CallSystem(cmd)
Parameters:
Name Type
-------------
cmd string
Returns:
integer
Synopsis: The CallSystem command passes the argument cmd to the underlying
operating system for execution. It returns the integer value returned by
the operating system.
If the results of the execution are to be returned as a string in Darwin,
then the command TimedCallSystem will do this without the need of an
intermediate file. Also the command OpenPipe allows the direct reading of
the output of a system command
Examples:
> CallSystem('date');
Fri Apr 25 12:39:18 MEST 2003
0
See also: ?FileStat ?LockFile ?OpenPipe ?SystemCommand ?TimedCallSystem
CaseSearchString
Function CaseSearchString - case sensitive exact string searching
Option: builtin
Calling Sequence: CaseSearchString(pat,txt)
Parameters:
Name Type
-------------
pat string
txt string
Returns:
{-1,0,posint}
Synopsis: This returns the offset before the character where pat matches with
txt. If pat does not match txt, -1 is returned.
Examples:
> CaseSearchString('here', 'It is in here');
9
> CaseSearchString('it', 'It is in here');
-1
See Also:
?BestSearchString ?SearchApproxString ?SearchString
?HammingSearchString ?SearchDelim
?MatchRegex ?SearchMultipleString
CenterTreeRoot
Function CenterTreeRoot - Place root in center of tree
Calling Sequence: CenterTreeRoot(t)
Parameters:
Name Type Description
-------------------------
t Tree a Tree
Returns:
Tree
Synopsis: Place root of tree such that the number of leaves on each side is
most equal. Useful when drawing circular trees when the root has been
placed far from the center.
Examples:
> t := Tree(Leaf('1',3),0,Tree(Tree(Leaf('2',3),2,Leaf('3',3)),1,Leaf('4',3))):
> CenterTreeRoot(t);
Tree(Tree(Leaf(2,1.5000),0.5000,Leaf(3,1.5000)),0,Tree(Leaf(1,4.5000),0.5000,Leaf(4,2.5000),100))
See also: ?RotateTree ?TreeSize
ChangeLeafLabels
Function ChangeLeafLabels( t:Tree, Labels:list )
Replaces the number of the leaves (t[3]) by the name in the list Labels
CheckAmbigTree
Function CheckAmbigTree( t:Tree )
Tree t must contain species information at position 6 and 7.
To get this, use AddSpecies.
The function checks for violations of rules such as "a is closer to b
than to c in one place but a is closer to c than to b in another place".
The number of violations for each subtree are counted and added to the
tree at position 8.
If an additional argument is given, a list of rules, those
rules are taken. (Function FindRules finds those rules :-)
ChiSquare_Rand
Function ChiSquare_Rand - Generate random Chi-square distributed reals
Calling Sequence: Rand(ChiSquare(nu))
Parameters:
Name Type
------------------
nu nonnegative
Returns:
nonnegative
Synopsis: This function returns a random chi-square distributed number with
average nu and variance 2*nu. When nu is an integer, the sum of the squares
of nu Normal(0,1) variables is distributed as ChiSquare(nu). ChiSquare_Rand
uses Rand() which can be seeded by either the function SetRand or
SetRandSeed.
References: Handbook of Mathematical functions, Abramowitz and Stegun, 26.4
Examples:
> Rand(ChiSquare(3));
1.4350
> Rand(ChiSquare(100));
123.9082
See Also:
?Beta_Rand ?FDist_Rand ?Normal_Rand ?StatTest
?Binomial_Rand ?GammaDist_Rand ?Poisson_Rand ?Std_Score
?CreateRandSeq ?Geometric_Rand ?SetRand ?Student_Rand
?Cumulative ?Graph_Rand ?SetRandSeed ?Zscore
?Exponential_Rand ?Multinomial_Rand ?Shuffle
Cholesky
Function Cholesky - decomposition of a positive definite matrix A = R * R^t
Option: builtin
Calling Sequence: Cholesky(A)
Parameters:
Name Type Description
------------------------------------
A matrix(numeric) a matrix
Returns:
matrix
Synopsis: R := Cholesky(A) computes the Cholesky decomposition of the matrix
A. A is the input matrix, and must be a square, symmetric, positive
definite matrix. If A does not satisfy these conditions, an error is
returned. R is a square matrix, lower triangular, such that R*transpose(R)
= A. Cholesky is used to check for positive-definiteness, and at the same
time it allows to solve a system Ax=b (by doing two back-substitutions) if
it is positive-definite.
Examples:
> A := [[3,1,2],[1,2,-1],[2,-1,5]];
A := [[3, 1, 2], [1, 2, -1], [2, -1, 5]]
> R := Cholesky(A);
R := [[1.7321, 0, 0], [0.5774, 1.2910, 0], [1.1547, -1.2910, 1.4142]]
> R * R^t;
[[3.0000, 1, 2], [1, 2, -1.0000], [2, -1.0000, 5.0000]]
See Also:
?convolve ?GivensElim ?matrix
?Eigenvalues ?Identity ?matrix_inverse
?GaussElim ?LinearProgramming ?transpose
CircularTour
Function CircularTour - find a minimal cost Circular tour
Calling Sequence: CircularTour(seqs)
CircularTour(AllAll)
CircularTour(Dist)
Parameters:
Name Type Description
-------------------------------------------------------------------
seqs list(string) a list of Sequences (DNA or proteins)
AllAll matrix(Alignment) all vs all Alignment matrix
Dist matrix(numeric) all vs all distance matrix (symmetric)
Returns:
list(posint)
Synopsis: This is a front-end to ComputeTSP where we give as input either a
set of sequences or a distance matrix or an AllAll matrix and the result is
a minimal cost tour broken at the most convenient place (highest cost). The
input can be:
List of sequences - n sequences. The sequences are aligned all against all
using Global alignments with the default DM matrix. (the rest
is as with AllAll matrix).
AllAll matrix - an n x n symmetric matrix of Alignments. If the Alignments
have a PamDistance, the minimal cost tour is based on
PamDistances. If not it is based on maximizing the Score of
the neighbouring alignments.
Distance matrix - an n x n symmetric distance matrix. The tour is computed to
minimize the sum of the distances.
The output is the list of indices in the best tour of length n.
Examples:
> seqs := [SSSS, AAAA, AAAS, AASS, ASSS, SSSA, SSAA, SAAA]:
> CircularTour(seqs);
[5, 4, 3, 2, 8, 7, 6, 1]
See also: ?Clusters ?ComputeTSP ?FindCircularOrder ?MAlign
Clique
Function Clique - Maximum clique exact/approximate algorithm
Calling Sequence: Clique(A)
Parameters:
Name Type Description
--------------------------
A Graph a Graph
Returns:
set
Global Variables: CliqueUpperBound
Synopsis: The input to this algorithm is an undirected graph. An undirected
graph is represented as a Graph data structure which should accept two
selectors: Nodes and Edges. The Maximum Clique problem is finding a set of
completely connected vertices which is of maximum size.
The output is a set of the Nodes in the clique. The algorithm computes an
upper bound on the size of the maximum clique which is left in the global
variable CliqueUpperBound. If this coincides with the size of the answer,
it means that the answer is optimal (maximal). The global variable
CliqueIterFactor may be assigned a non-negative number f. The algorithm
will then run for f*n^2 iterations. If f=0 then only the greedy heuristic
is run, and this is quite fast. The larger f, the more accurate the answers
will be, and the more time the algorithm will consume.
The Clique problem is closely related to the Vertex Cover problem. They can
be related by the following formula:
Clique(G) = NodeComplement(VertexCover(EdgeComplement(G)))
Examples:
> hex := HexahedronGraph();
hex := Graph(Edges(Edge(0,1,2),Edge(0,1,4),Edge(0,1,5),Edge(0,2,3),Edge(0,2,6),Edge(0,3,4),Edge(0,3,7),Edge(0,4,8),Edge(0,5,6),Edge(0,5,8),Edge(0,6,7),Edge(0,7,8)),Nodes(1,2,3,4,5,6,7,8))
> Clique(hex);
{7,8}
See Also:
?BipartiteGraph ?Graph_Rand ?ParseDimacsGraph
?DrawGraph ?Graph_XGMML ?Path
?Edge ?InduceGraph ?RegularGraph
?EdgeComplement ?MaxCut ?ShortestPath
?Edges ?MaxEdgeWeightClique ?TetrahedronGraph
?FindConnectedComponents ?MinCut ?VertexCover
?Graph ?MST
?Graph_minus ?Nodes
Clustal
Function Clustal( ma:array(string) )
Use clustal program to align sequences.
ClustalMSA
Function ClustalMSA - Multiple sequence alignment using clustalw2
Calling Sequence: ClustalMSA(seqs,{optional_args})
Parameters:
Name Type Description
--------------------------------------------------------------------------------------------------------------
GENERAL SETTINGS
seqs list(string) sequences to align
labels list(string) (opt) sequence labels
bootstrap numeric (opt) nr. of bootstraps
quicktree boolean (opt) FAST algo for guide tree?
seqtype string (opt) type of sequence
tmpdir string (opt) dir for tempfiles
FAST PAIRWISE AL.
ktuple numeric (opt) word size
topdiags numeric (opt) nr. of best diag.
window numeric (opt) window around best diag.
pairgap numeric (opt) gap penalty
SLOW PAIRWISE AL.
pwmatrix {CodonMatrix,DayMatrix,string} (opt) protein weight matrix
pwgapopen numeric (opt) gap open penalty
pwgapext numeric (opt) gap ext. penalty
MULTIPLE AL.
msamatrix {CodonMatrix,DayMatrix,string} (opt) protein weight matrix
gapopen numeric (opt) gap opening penalty
gapext numeric (opt) gap ext. penalty
endgaps boolean (opt) no end gap sep. penalty
gapdist numeric (opt) gap sep. penalty range
nogap boolean (opt) residue-spec. gaps off
nohgap boolean (opt) hydrophilic gaps off
maxdiv numeric (opt) % ident. for delay
transweight numeric (opt) transitions weighting
iteration string (opt) NONE, TREE or ALIGNMENT
numiter numeric (opt) max nr of iterations
STRUCTURE AL.
helixgap numeric (opt) gap penalty for helix core residues
strandgap numeric (opt) gap penalty for strand core residues
loopgap numeric (opt) gap penalty for loop regions
terminalgap numeric (opt) gap penalty for structure termini
helixendin numeric (opt) nr of res. inside helix to be treated as terminal
helixendout numeric (opt) nr of res. outside helix to be treated as terminal
strandendin numeric (opt) nr of res. inside strand to be treated as terminal
strandendout numeric (opt) nr of res. outside strand to be treated as terminal
Returns:
MAlignment
Synopsis: ClustalMSA computes a multiple sequence alignment (MSA). If no
Dayhoff or Codon matrix is passed, clustalw uses the Gonnet scoring matrix.
The score and upperbound score in the MAlignment data structure is left
undefined. The function works only in unix/linux, and assumes that clustalw
is available (set environment variable $Clustalw to point to binary). More
information and source of clustalw is available at 'http://www.clustal.org/
'.
Optional arguments and their default values:
seqs: true
bootstrap: 1000
quicktree: false
seqtype: guessed from seqs, {PROTEIN, DNA}
tmpdir: /tmp
ktuple: 1
topdiags: 5
window: 5
pairgap: 3
pwmatrix: GONNET, {DayMatrix, CodonMatrix, 'GONNET', 'BLOSUM', 'PAM', 'ID'}
pwgapopen: 10
pwgapext: 0.1
msamatrix: GONNET, {DayMatrix, CodonMatrix, 'GONNET', 'BLOSUM', 'PAM', 'ID'}
gapopen: 10
gapext: 0.2
endgaps: false
gapdist: 4
nogap: false
maxdiv: 30
transweight: 0.5
iteration: NONE, {'NONE', 'TREE', 'ALIGNMENT'}
numiter: 0
helixgap: 4
strandgap: 4
loopgap: 1
terminalgap: 2
helixendin: 3
helixendout: 0
strandendin: 1
strandendout: 1
Examples:
> msa := ClustalMSA(['ASDFAARA','ASDAVRA','ASFDAATA','ASGDAGTA']);
> print(msa);
Multiple sequence alignment:
----------------------------
Score of the alignment: 0
Maximum possible score: 1.7976931e+308
1 ASDFAARA
2 AS_DAVRA
3 ASFDAATA
4 ASGDAGTA
See also: ?Align ?Alignment ?MafftMSA ?MAlign ?MAlignment
ClusterRelPam
Function ClusterRelPam( MinSquareTree:Tree, MaxPW:array )
returns an array of array of clusters for the Pam windows.
Each sequence from SeqToMul can be addressed directly by
[PAMwindow_no, Cluster_no, Sequence_no]
Clusters
Function Clusters - find Clusters of seqs or objects
Calling Sequence: Clusters(seqs,lim)
Clusters(AllAll,lim)
Clusters(Dist,lim)
Parameters:
Name Type Description
-------------------------------------------------------------------
seqs list(string) a list of Sequences (DNA or proteins)
AllAll matrix(Alignment) all vs all Alignment matrix
Dist matrix(numeric) all vs all distance matrix (symmetric)
lim symbol = positive mode and value used to define clusters
Returns:
list(set(posint))
Synopsis: This function finds clusters in a set of sequences or any objects
from their distance or similarity constraints. The input is either a set of
sequences or a distance matrix or an AllAll matrix and the result is a list
of sets of clusters. The components of the clusters are identified by the
indices to the seqs or AllAll or Dist arrays. The parameters can be:
List of sequences - n sequences. The sequences are aligned all against all
using Global alignments with the default DM matrix. (the rest
is as with AllAll matrix).
AllAll matrix - an n x n symmetric matrix of Alignments. If the cluster
definition is based on MaxDistance=ddd or AveDistance=dd then
the clusters are selected so that the PamDistance (or average)
of the Alignments are less than ddd. If MinSimil=sss or
AveSimil=sss is specified, the the clusters will be determined
by the Score (or average) of the Alignments being larger than
sss.
Distance matrix - an n x n symmetric distance matrix. MaxDistance=ddd or
AveDistance=ddd should be specified and the clusters are
determined by this maximum/average distance.
MaxDistance = ddd - The clusters are determined by the distance ddd. I.e. any
two sequences or objects which are separated by less than ddd
will be part of the same cluster
AveDistance = ddd - The clusters are determined by the distance ddd. The
clusters are built one at a time, starting with the first
sequence/object and adding one member at a time. The member
added is the one whose average distance to the rest of the
cluster is less than ddd. The clusters built this way, may
depend on the order of the input sequences.
MinSimil = sss - Like MaxDistance, but the selection criteria is based on
Similarity or Score being greater than sss.
AveSimil = sss - Like AveDistance, but the selection criteria is based on the
average Similarity or Score being greater than sss.
The output is the list of sets of indices. Each set is a cluster. All
indices are included, hence some clusters may be singletons.
Examples:
> seqs := [SSSSS, AAAAA, AAAAS, SASSS, SSSSA, ASAAA]:
> Clusters(seqs,AveSimil=8);
[{1,4,5}, {2,3,6}]
See also: ?CircularTour ?ComputeTSP ?FindCircularOrder ?MAlign
Code
Class Code - placeholder for text that should be displayed "as is"
Template: Code(string1,...)
Fields:
Name Type Description
---------------------------------------------
string1 string text to be displayed as is
Returns:
Code
Methods: Code_type HTMLC LaTeXC print string
Synopsis: The Code data structure holds text that is to be displayed
preserving all spaces, tabs, newlines, etc. This is what is expected to
happen to a program. The content of Code will normally be displayed with
constant-width font. Any newlines appearing in the argument strings will be
displayed. Additionally, a newline is inserted at the end of every
argument. Arguments of Code will be displayed in new lines. So if the
insertion of code is desired within a sentence, the TT() structure should be
used (constant width font).
Examples:
> Code( 'for i to 10 do lprint(i^2) od');
Code(for i to 10 do lprint(i^2) od)
See Also:
?Block ?HTML ?List ?RunDarwinSession
?Color ?HyperLink ?Paragraph ?screenwidth
?Copyright ?Indent ?PostscriptFigure ?Table
?DocEl ?LastUpdatedBy ?print ?TT
?Document ?latex ?Roman ?View
CodonAlign
Function CodonAlign - align codon sequences using dynamic programming
Calling Sequence: CodonAlign(seq1,seq2,method,cm)
Parameters:
Name Type Description
-----------------------------------------------------------------------------
seq1 string codon sequence
seq2 string codon sequence
method string the mode of dynamic programming to use
cm {DayMatrix,list(DayMatrix)} codon matrices used for alignment
Returns:
Alignment
Global Variables: logPAM1
Synopsis: CodonAlign does an alignment of two codon sequences using the
similarity scores given in the DayMatrix (of type 'Codon') and the given
method. If a single DayMatrix is given, the alignment is done using it. If
a list of DayMatrix is given, it is understood that the best CodonPAM matrix
be used. Since the introduction of the generic dynamic programming,
CodonAlign is only a wrapper function. It extracts the DNA sequence from an
entry and converts the codon sequence to a character string for the generic
Align function.
Examples:
> CodonAlign('AAACCCGGG','AAGCCGGGG', CM);
Alignment('AVq','CWq',30.2765,DM,0,0)
> CodonAlign('AAACCCGGG','AAGCCGGGG',CMS);
Alignment('AVq','CWq',34.3914,DMS[345],79,30984.0898)
See also: ?Align ?CodonDynProgStrings ?CreateCodonMatrices ?DayMatrix
CodonCount
Function CodonCount - Count the number codons
Calling Sequence: CodonCount()
CodonCount(dna)
Parameters:
Name Type Description
--------------------------------------
dna string a string of coding DNA
Returns:
list
Global Variables: CodonCountsG DBmarkG
Synopsis: The function CodonCount count all codons in the loaded database (if
no arguments) or counts the codons in DNA sequence coding for a protein
(given as an argument). The function returns a list of codon occurrences.
See also: ?CodonUsage
CodonDynProgStrings
Function CodonDynProgStrings - compute score and aligned strings from a codon
alignment
Calling Sequence: CodonDynProgStrings(al)
Parameters:
Name Type Description
----------------------------------
al Alignment Codon alignment
Returns:
[numeric, string, string] : [score,seq1,seq2]
Synopsis: Returns a list with the similarity score, first sequence and second
sequence suitable for printing the aligned DNA sequences (with '___'
inserted at gap positions).
Examples:
> al := CodonAlign(AAACCCGGGTTT,AAACCTTTT,CMS,Global);
al := Alignment('AVq#','AX#',10.7382,DMS[368],102,47328.8945,{Global})
> CodonDynProgStrings(al);
[10.7382, AAACCCGGGTTT, AAACCT___TTT]
See also: ?CodonAlign ?CreateCodonMatrices ?EstimateCodonPAM
CodonMatrix
Class CodonMatrix - a codon mutation matrix
Template: CodonMatrix()
CodonMatrix(Sim, Desc, CodonPam)
CodonMatrix(Sim, Desc, CodonPam, AAPAM)
CodonMatrix(Sim, Desc, CodonPam, AAPAM, FixedDel, IncDel)
Fields:
Name Type Description
-------------------------------------------------------------------------------------------
Sim matrix(numeric,64) 64 x 64 codon similarity matrix
Desc string a description
CodonPam numeric CodonPam number of the matrix
AAPam numeric the equivalent PAM distance
FixedDel numeric the constant part of the deletion costs
IncDel numeric the length-dependent part of the deletion costs
PamDistance numeric synonym of CodonPam
PamNumber numeric synonym of CodonPam
MaxSim numeric the highest similarity score in the matrix
MinSim numeric the lowest similarity score in the matrix
MaxOffDiag numeric the highest similarity score that is not in the diagonal
Type string synonym of Desc
Description string synonym of Desc
Methods: CodonMatrix_type lprint print Rand select string
Synopsis: A CodonMatrix contains everything that is needed to score codon
alignments. This is basically the 64x64 scoring matrix plus the deletion
cost function. These costs are based on the PAM distance equivalent and are
calculated automatically if they are not given as an argument. A
CodonMatrix is now only used for storing SynPAM matrices
See also: ?CreateSynMatrices ?EstimateSynPAM
CodonMutate
Function CodonMutate - randomly evolve a codon sequence
Calling Sequence: CodonMutate(seq1,cpam)
CodonMutate(seq1,cpam,DelType,lnM1)
Parameters:
Name Type Description
-------------------------------------------------------------
seq1 string codon sequence
cpam positive CodonPAM distance to mutate
DelType ExpGaps (optional) gap type
lnM1 matrix(numeric) (optional) log. of a 1-PAM matrix
Returns:
string
Synopsis: Mutates a sequence of codons over a certain CodonPAM distance.
Stop codons always mutate to stop codons while sense codon always mutate to
sense codons. When a gap type is given, the function returns not only the
mutated string, but also the two aligned sequences, where the exact position
of the gaps can be seen. lnM1 is by default assumed to be CodonLogPAM1 which
must be created with CreateDayMatrices() first.
Examples:
> CodonMutate(CCCATCAACACTGAC,50);
CCTATCGCCACCGAC
See also: ?CreateCodonMatrices ?CreateRandSeq ?Mutate
CodonPamToPam
Function CodonPamToPam - Convert CodonPAM to PAM.
Calling Sequence: CodonPamToPam(lnM1,CF,CodonPam)
Parameters:
Name Type Description
---------------------------------------------------------------------------
lnM1 matrix(numeric,64) Logarithm of a 1-PAM codon mutation matrix.
CF array(numeric,64) Codon frequencies
CodonPam numeric CodonPAM to be converted
Returns:
numeric
Synopsis: Converts CodonPAM to PAM. This conversion depends on the amount of
synonymous mutations for a species or set of species, so the logarithm of
the 1-CodonPAM matrix and the codon frequencies are required as arguments.
The conversion is done by summing up the percentage of synonymous mutations
in the codon matrix. This sum is the expected percentage of identical amino
acids at this CodonPAM distance, which then can be converted to PAM using
the PerIdentToPam function.
Examples:
> CodonPamToPam(CodonLogPAM1,CF,50);
23.3413
See also: ?CreateCodonMatrices ?PamToCodonPam ?PerIdentToPam
CodonToA
Function CodonToA
Calling Sequence: CodonToA(triple)
Parameters:
Name Type
------------------------------------
triple a 3 letter DNA/RNA sequence
Returns:
one letter amino acid description
Synopsis: This function converts a 3 letter DNA/RNA sequence into the amino
acid specified by the genetic code. It returns $ when the given codon
corresponds to the stop codon.
Examples:
> CodonToA('UUU');
F
See Also:
?AAAToInt ?BaseToInt ?CIntToCodon ?IntToA ?IntToCInt
?aminoacids ?BBBToInt ?CIntToInt ?IntToAAA ?IntToCodon
?AminoToInt ?BToInt ?CodonCode ?IntToAmino
?AToCInt ?CIntToA ?CodonToCInt ?IntToB
?AToCodon ?CIntToAAA ?CodonToInt ?IntToBase
?AToInt ?CIntToAmino ?GeneticCode ?IntToBBB
CodonToCInt
Function CodonToCInt - convert a 3-letter codon into a integer
Calling Sequence: CodonToCInt(code)
Parameters:
Name Type Description
----------------------------------------------------------------
code string three nucleic (DNA, RNA) bases (one letter each)
Returns:
0..64
Synopsis: The 64 different codons over the alphabet {A, C, G, T=U} are
ordered from 1..64. This function converts a codon to a number between
1..64. If it contains an invalid base or an X, it returns 0.
Examples:
> CodonToCInt('TTT');
64
See Also:
?AAAToInt ?BaseToInt ?CIntToCodon ?IntToA ?IntToCInt
?aminoacids ?BBBToInt ?CIntToInt ?IntToAAA ?IntToCodon
?AminoToInt ?BToInt ?CodonCode ?IntToAmino
?AToCInt ?CIntToA ?CodonToA ?IntToB
?AToCodon ?CIntToAAA ?CodonToInt ?IntToBase
?AToInt ?CIntToAmino ?GeneticCode ?IntToBBB
CodonToInt
Function CodonToInt
Option: builtin
Calling Sequence: CodonToInt(UUU)
Parameters:
Name Type
--------------------------------------------------
UUU a three RNA base sequence (one letter each)
Returns:
1..22
Synopsis: This function converts a three RNA base sequence to the amino acid
number it specifies according to the standard genetic code. If the triplet
is unknown, the value 21 is returned. If it is a stop codon, it returns 22.
Examples:
> CodonToInt('UUU');
14
See Also:
?AAAToInt ?BaseToInt ?CIntToCodon ?IntToA ?IntToCInt
?aminoacids ?BBBToInt ?CIntToInt ?IntToAAA ?IntToCodon
?AminoToInt ?BToInt ?CodonCode ?IntToAmino
?AToCInt ?CIntToA ?CodonToA ?IntToB
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase
?AToInt ?CIntToAmino ?GeneticCode ?IntToBBB
CodonUsage
Function CodonUsage - Get Codon Usage for a particular amino acid
Calling Sequence: CodonUsage()
CodonUsage(dna)
Parameters:
Name Type Description
---------------------------
dna string coding DNA
Returns:
list
Synopsis: Get the codon usage for sequence of coding DNA. If no argument is
give the function gets the codon usage for all entries in the loaded
database.
Examples:
> CodonUsage();
[[[GCA, 0], [GCC, 0], [GCG, 0], [GCT, 0]], [[AGA, 0], [AGG, 0], [CGA, 0], [CGC, 0], [CGG, 0], [CGT, 0]], [[AAC, 0], [AAT, 0]], [[GAC, 0], [GAT, 0]], [[TGC, 0], [TGT, 0]], [[CAA, 0], [CAG, 0]], [[GAA, 0], [GAG, 0]], [[GGA, 0], [GGC, 0], [GGG, 0], [GGT, 0]], [[CAC, 0], [CAT, 0]], [[ATA, 0], [ATC, 0], [ATT, 0]], [[CTA, 0], [CTC, 0], [CTG, 0], [CTT, 0], [TTA, 0], [TTG, 0]], [[AAA, 0], [AAG, 0]], [[ATG, 0]], [[TTC, 0], [TTT, 0]], [[CCA, 0], [CCC, 0], [CCG, 0], [CCT, 0]], [[AGC, 0], [AGT, 0], [TCA, 0], [TCC, 0], [TCG, 0], [TCT, 0]], [[ACA, 0], [ACC, 0], [ACG, 0], [ACT, 0]], [[TGG, 0]], [[TAC, 0], [TAT, 0]], [[GTA, 0], [GTC, 0], [GTG, 0], [GTT, 0]], [[XXX, 1]], [[TAA, 0], [TAG, 0], [TGA, 0]]]
See also: ?CodonCount ?RSCU
Collapse
Function Collapse( g:Graph )
Collapses cycles in g by removing edges.
CollapseNodes
Function CollapseNodes
Calling Sequence: CollapseNodes(tree,PAM = pam)
CollapseNodes(tree,NodeCount = ncount)
CollapseNodes(tree,Class = class)
CollapseNodes(tree,Bootstrapping = boots)
Parameters:
Name Type Description
-----------------------------------------------------------------
tree Tree
pam positive PAM distance
ncount posint Number of nodes
class {string,list(string)} Lineage(s)
boots posint Minimal bootstrapping percentage
Returns:
Tree
Synopsis: Collapses subtrees to a single leaf. With the PAM option, all
leaves that are at most the desired PAM distance from each other are
collapsed. The NodeCount option collapses all subtrees with less or equal
this number of leaves. The Class option is used for species trees to
collapse leaves that are from the same class of species. Finally,
Bootstrapping collapses all subtrees where all nodes are at least boots%
supported by bootstrapping.
Examples:
> tree := Tree(Leaf(Mouse,-2.9000),0,Tree(Leaf(Human,-2.2000),-0.6000,
Leaf(Dog,-2.7000)));
tree := Tree(Leaf(Mouse,-2.9000),0,Tree(Leaf(Human,-2.2000),-0.6000,Leaf(Dog,-2.7000)))
> CollapseNodes(tree,PAM=4);
Tree(Leaf(Mouse,-2.9000),0,Leaf(Human/Dog,-2.4500))
See also: ?BootstrapTree ?PrintTreeSeq ?Tree
CollectStat
Function CollectStat - collect and summarize Stat structures
Calling Sequence: CollectStat(data)
Parameters:
Name Type Description
-------------------------------------------------------------------
data anything any structure/list/set containing Stat structures
Returns:
list(Stat)
Synopsis: CollectStat will inspect any list/structure/set and collect all the
Stat structures together. The Stat structures will be union-ed whenever
they have the same description. This provides an easy way of adding
together several simulation results, which have been obtained in different
runs.
See also: ?OutsideBounds ?Stat ?union ?UpdateStat
Color
Class Color - structure to define the color of some document part
Template: Color(colcode,doc1,...)
Fields:
Name Type Description
-----------------------------------------------------
colcode string a color name or its hex RGB values
Returns:
Color
Methods: Color_type HTMLC LaTeXC string
Synopsis: The Color data structure holds document parts that are to be
displayed in that color. The number of arguments is variable.
Examples:
> Color( red, 'Your balance is negative' );
Color(red,Your balance is negative)
See Also:
?Block ?HyperLink ?PostscriptFigure ?string_RGB
?Code ?Indent ?print ?Table
?Copyright ?LastUpdatedBy ?RGB_string ?TT
?DocEl ?latex ?Roman ?View
?Document ?List ?RunDarwinSession
?HTML ?Paragraph ?screenwidth
ColorPalette
Function ColorPalette - creates a set of colors according to a colormap
Calling Sequence: ColorPalette(n)
ColorPalette(n,map)
Parameters:
Name Type Description
------------------------------------------------------------
n posint the number of different colors to be created
map string (optional) a colormap
Returns:
list([nonnegative, nonnegative, nonnegative])
Synopsis: This function computes n different colors according to a colormap
and returns their RGB values between [0,1]. The possible colormaps are
described below and completely specify the appearance of the colors:
The map parameter can be one of the following colormaps:
jet
jet jet ranges from blue to red, and passes through the colors cyan,
yellow, and orange. It is a variation of the hsv colormap.
hsv hsv varies the hue component of the hue-saturation-value color
model. The colors begin with red, pass through yellow, green,
cyan, blue, magenta, and return to red. The colormap is
particularly appropriate for displaying periodic functions.
heat heat varies the color from a saturated blue through white into a
saturated red. This map is useful for heatmaps, where negative
and positive values are possible.
stoplight stoplight gives colors from red through yellow to green.
lines lines gives a list of distinct colors.
Examples:
> colors := ColorPalette(10);
colors := [[0, 0, 1], [0, 0.4444, 1], [0, 0.8889, 1], [0, 1, 0.6667], [0, 1, 0.2222], [0.2222, 1, 0], [0.6667, 1, 0], [1, 0.8889, 0], [1, 0.4444, 0], [1, 0, 0]]
See Also:
?BrightenColor ?DrawPointDistribution ?Set
?DrawDistribution ?DrawStackedBar ?SmoothData
?DrawDotplot ?DrawTree ?StartOverlayPlot
?DrawGraph ?GetColorMap ?StopOverlayPlot
?DrawHistogram ?Plot2Gif ?ViewPlot
?DrawPlot ?PlotArguments
Complement
Function Complement - complement of a DNA sequence
Calling Sequence: Complement(nuc)
Parameters:
Name Type Description
-----------------------------------------
nuc string a string of DNA/RNA bases
Returns:
string
Synopsis: Computes the complement DNA/RNA of the given sequence. For more
clarity, the antiparallel of AACC is GGTT. The reverse of AACC is CCAA and
the Complement of AACC is TTGG. The Complement of a DNA sequence does not
form a double helix with the sequence.
Examples:
> Complement('ACTTACG');
TGAATGC
See Also:
?AltGenCode ?BaseToInt ?CIntToAmino ?CodonToInt ?IntToCInt
?AminoToInt ?BBBToInt ?CIntToCodon ?GeneticCode ?IntToCodon
?antiparallel ?BToInt ?CIntToInt ?IntToB ?Reverse
?AToCInt ?CIntToA ?CodonToA ?IntToBase
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBBB
ComplementSequence
Function ComplementSequence
Calling Sequence: ComplementSequence(offset)
Parameters:
Name Type
----------------
offset integer
Returns:
integer : integer
Synopsis: Returns the numeric offset of the sequence ofs is pointing to and
the negative offset of the original sequence passed to GetComplement.
See Also:
?AltGenCode ?BaseToInt ?CIntToAmino ?CodonToInt ?IntToBBB
?AminoToInt ?BBBToInt ?CIntToCodon ?Complement ?IntToCInt
?antiparallel ?BToInt ?CIntToInt ?GeneticCode ?IntToCodon
?AToCInt ?CIntToA ?CodonToA ?IntToB ?Reverse
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase
Complex
Function Complex( Re:numeric, Im:numeric )
Data structure Complex( Re, Im )
Representation of complex numbers by a pair of numerical
arguments, the real part and the complex part.
- Operations:
Initialization: a := Complex(1,1);
b := Complex(0,1);
All arithmetic operations:
a+b, a-b, a*b, a/b, a^b
Special functions exp(a), ln(a), sin(a), cos(a), tan(a)
Printing: print(a);
printf( '%.3f', a );
Type testing: type(a,Complex);
- Conversions:
To string : string(a)
- Selectors:
a[Re] : real part
a[Im] : imaginary part
ComputeCAI
Function ComputeCAI - Compute Codon Adaptation Index
Calling Sequence: ComputeCAI(e)
Parameters:
Name Type Description
---------------------------------------
e {Entry,string} dna information
Returns:
numeric
Synopsis: Computes the CAI (codon adaptation index) for a dna string or an
entry (with DNA tag). The function requires the the Relative Adaptiveness RA
has to be calculated prior to calling ComputeCAI.
See also: ?SetupRA
ComputeCAIVector
Function ComputeCAIVector - Compute CAI for all AA individually
Calling Sequence: ComputeCAI(e)
Returns:
list
Synopsis: Computes the CAI for all codons in an entry.
See also: ?ComputeCAI ?SetupRA
ComputeCubicTSP
Function ComputeCubicTSP - compute Travelling Salesman Cycle (cubic time)
Option: builtin
Calling Sequence: ComputeCubicTSP(Dist,trials,p1..pk)
Parameters:
Name Type Description
--------------------------------------------------------------------------
Dist matrix(nonnegative) symmetric, square, distance matrix
trials posint number of random starting points (optional)
p1..pk list(posint) optional good solutions
Returns:
list(posint)
Synopsis: Compute a minimum distance cycle (symmetric travelling salesman
problem) with a heuristic O(n^3) algorithm. The second argument is
optional. If present it indicates the number of (random starting points)
trials that will be computed; the best cycle/tour of these will be returned.
The third ... kth arguments are also optional and are permutations of
integers which are good solutions to the TSP problem. These will be used as
seeds to build new (better) solutions. This is the default function used by
ComputeTSP. It should be used only when you can provide initial good
solutions or a different number of trials is desired.
See also: ?ComputeTSP
ComputeDimensionlessFit
Function ComputeDimensionlessFit - dimensionless fitting of a distance tree
Calling Sequence: ComputeDimensionlessFit(t,Dist,Var)
Parameters:
Name Type Description
-------------------------------------------------------------------
t Tree the given tree
t matrix(numeric) alternatively, the distances over the tree
Dist matrix(numeric) distance matrix
Var matrix(numeric) variances of the distances
Returns:
nonnegative
Synopsis: This function computes the Dimensionless fitting index of a set of
distances over a tree as in "A Dimensionless Fit Measure for Phylogenetic
Distance Trees", J Bioinform Comput Biol, vol 3, pp 1429-1440. Trees built
over the same set of species, even with radically different methods, can be
ranked by the quality of their fit with this index. The input can be a tree
(which is converted to an actual distance matrix with the Tree_matrix) or an
actual distance matrix. The input matrices Dist and Var are the distance
measured between the species and their variance.
See Also:
?BootstrapTree ?LeastSquaresTree ?RBFS_Tree ?Synteny
?GapTree ?PhylogeneticTree ?SignedSynteny ?Tree_matrix
ComputeQuarticTSP
Function ComputeQuarticTSP - compute Travelling Salesman Cycle (quartic time)
Option: builtin
Calling Sequence: ComputeQuarticTSP(Dist,trials,p1..pk)
Parameters:
Name Type Description
--------------------------------------------------------------------------
Dist matrix(nonnegative) symmetric, square, distance matrix
trials posint number of random starting points (optional)
p1..pk list(posint) optional good solutions
Returns:
list(posint)
Synopsis: Compute a minimum distance cycle (symmetric travelling salesman
problem) with a heuristic O(n^4) algorithm. The second argument is
optional. If present it indicates the number of (random starting points)
trials that will be computed; the best cycle/tour of these will be returned.
The third ... kth arguments are also optional and are permutations of
integers which are good solutions to the TSP problem. These will be used as
seeds to build new (better) solutions.
See also: ?ComputeTSP
ComputeTPI
Function ComputeTPI - TPI index of a DNA sequence
Calling Sequence: ComputeTPI(e,mode)
Parameters:
Name Type Description
------------------------------------------------------
e string an Entry which contains a DNA sequence
mode string (optional), the string AllAA
Returns:
list({list,numeric})
Synopsis: The TPI index measures how much correlation there exists among the
consecutive tRNAs coding for each amino acid. This autocorrelation is
measured in a way that it is insensitive to different frequencies of amino
acids, different frequencies of tRNAs and different frequencies of bases.
Two indices are computed, which are two representations of the same
magnitude. In both cases, the TPI measures the cumulative distribution of
the number of pairs of consecutive tRNAs coding for the same amino acids.
If the actual number of pairs it too low, this means that the tRNAs are
"rotated" around quite often. If the number of pairs is high, this means
that the tRNAs are "reused" often. The first value returned is a normal
deviate with the same cumulative probability of having the observed number
of pairs. A negative value means less correlation than expected, a positive
value higher correlation than expected. Since it is a normal deviate, N(0,
1), it is easy to estimate how rare the values are. E.g. 1.96 means that it
is higher only 2.5% of the time, etc. The second value is the cumulative
probability of having the observed number of pairs spread over the interval
-1 .. 1.
The function cannot compute the TPI unless the tRNA information is given.
This is normally done with the function SetuptRNA.
If a second argument, AllAA, is given, then both indices are computed for all
the individual amino acids as well as for the ensemble. In this case, a
list of lists is returned, where each component is a list with the two
values and the name of the amino acid.
See also: ?SetuptRNA ?TPIDistr
ComputeTSP
Function ComputeTSP
Calling Sequence: ComputeTSP(D)
Parameters:
Name Type Description
--------------------------------------------------
D matrix(numeric) symmetric distance matrix
Returns:
list(posint)
Global Variables: ComputeTSP_table
Synopsis: This function computes a minimum distance tour through the distance
matrix D (this is the symmetric travelling salesperson problem).
Examples:
> D := [[0,1,1,10],[1,0,10,1],[1,10,0,1],[10,1,1,0]];
D := [[0, 1, 1, 10], [1, 0, 10, 1], [1, 10, 0, 1], [10, 1, 1, 0]]
> ComputeTSP(D);
[1, 2, 4, 3]
See also: ?ComputeCubicTSP ?ComputeQuarticTSP
ConcatStrings
Function ConcatStrings
Calling Sequence: ConcatStrings(slist,sep)
Parameters:
Name Type Description
--------------------------------------------
slist array(string) array of strings
sep string (optional) separator
Returns:
string
Synopsis: Concatenates a list of strings to one string. The optional second
argument can be a separator character which is inserted between any two
substrings. This method is much more efficient than repeatedly appending to
a string.
Examples:
> ConcatStrings(['Hello ','World','!']);
Hello World!
> ConcatStrings(['A','B','C'],', ');
A, B, C
See also: ?RenderTemplate ?string ?trim
ConnectTcp
Function ConnectTcp
Option: builtin
Calling Sequence: ConnectTcp(path,slave)
Parameters:
Name Type Description
-------------------------------------
path string path to a UNIX pipe
slave boolean
Returns:
NULL
Synopsis: Creates connection to IPC daemon at path (a UNIX pipe). slave must
be false for all darwin processes not created by the daemon.
Examples:
> r := traperror(ConnectTcp('/tmp/.ipc/darwin', false));
> SendTcp('PING'); r := ReceiveTcp(3);
r := PING OK
> SendTcp('MSTAT linneus1'); r := ReceiveTcp(3);
r := DATA linneus1 0:OK ALIVE
> DisconnectTcp();;
See Also:
?darwinipc ?ParExecuteIPC ?ReceiveDataTcp ?SendTcp
?DisconnectTcp ?ParExecuteSlave ?ReceiveTcp
?ipcsend ?ParExecuteTest ?SendDataTcp
ConsistentGenome
Class ConsistentGenome - check the consistency of a database file
Template: ConsistentGenome(name)
Fields:
Name Type Description
-------------------------------------------------
name string 5-letter name of a species/genome
Returns:
NULL
Methods: ConsistentGenome_type
Synopsis: This function check various aspects of consistency of a database
file which contains a single genome. The database should have been loaded
with ReadDb before calling this function. Various header fields should be
present, (SCINAME, KINGDOM, 5LETTERNAME and optionally ALTGENETICCODE). The
entries should contain also a DNA entry, which is checked to be in
accordance with the protein sequence. This function will print error
messages of the inconsistencies found. For some errors, like identically
duplicated sequences, it will print editor commands (vi) to correct the
problems.
See also: ?database ?DB ?Entry ?GenomeSummary ?ReadDb ?Sequence
Counter
Class Counter - accumulates values
Template: Counter()
Counter(title)
Returns:
Counter
Fields:
Name Type Description
--------------------------------------------------
value numeric accumulated value of the counter
title string user-defined description
Methods: Counter_type plus printf Rand string times
Synopsis: A Counter is an object which stores a number. It is understood
that this number is incremented occasionally, that is the purpose of a
counter. It is possible to have as many counters as we want, each one with
its own description. The way to increment a Counter is by add a value to
the structure. E.g. c1+1, will increment the Counter c1 by 1. The counter
is incremented as a side effect of the addition (or subtraction). The
result of the expression is the total accumulated so far.
A Counter can also be multiplied by a numerical factor, this has little use in
practice, except for multiplying by zero which erases the Counter. E.g.
0*c1
Examples:
> c1 := Counter(iterations): c2 := Counter('Normal numbers'):
> to 100 do c1+1; c2+Rand(Normal) od:
> print(c1,c2);
iterations: 100
Normal numbers: 11.532789
See also: ?LinearRegression ?objectorientation ?Stat
Covariance
Class Covariance - store and compute covariances and correlations
Template: Covariance()
Covariance(Description)
Covariance(Description,VarNames)
Fields:
Name Type Description
------------------------------------------------------------------------
Description string descriptive name of the set
VarNames list names of the variables
Mean list(numeric) mean values of the variables
Variance list(numeric) variances of the variables
Minimum list(numeric) minimum values of the variables
Maximum list(numeric) maximum values of the variables
Number integer number of sample points recorded
MaxVariance [numeric, list] largest eigenvalue/eigenvector
Eigenvalues [list, matrix] eigenvalues/vectors of covariance matrix
CovMatrix matrix(numeric) estimated covariance matrix
CorrMatrix matrix(numeric) estimated correlation matrix
Returns:
Covariance
Methods: Covariance_type plus print Rand select string update
Synopsis: Covariance is a data structure which stores the values of vectors
of variables, and upon demand selects/computes various results. A call to
Covariance sets the space to record the information. Calls to Covariance_
update or adding a value to a Covariance variable records additional
results. At any point selections can be made, resulting in computations.
Further data can be added and further selections can be made. Each data
point should be a numerical vector of dimension m. The CovMatrix selector
returns an unbiased estimator of the covariances of the variables. The
diagonal of this matrix contains the estimates of the variances of the
variables. The CorrMatrix selector returns an unbiased estimator of the
correlation coefficients of the variables. Its diagonal is 1. MaxVariance
returns the largest eigenvalue of the covariance matrix and its
corresponding eigenvector. This vector gives the linear combination of the
variables that will show the largest variance. The Eigenvalues selector
returns a list, [e,v], with the eigenvalues and the eigenvectors of the
covariance matrix. e is sorted in increasing order ( e[1]<=e[2]<=...<=e[m]
) and v is the array of eigenvectors (each row is an eigenvector, v is an m
x m matrix). Covariance analysis is useful to find which are the linear
combinations of the data which give the maximum/minimum variances. If a is
a data point (a vector of dimension m), then a*v[i] has variance e[i]. If
the data has linear dependencies, then some linear combinations will have 0
variance. Then the smallest e value will be 0 (or roundoff error from 0).
The number of 0 (or near 0) eigenvalues is the number of linear dependencies
in the data.
Examples:
> c := Covariance('test two vars',[v1,v2]):
> c+[0,1]: c+[0.1,1.1]: c+[0.2,1.2]: c+[0,1.3]:
> print(c);
Covariance analysis for test two vars, 4 data points
v1 v2
Means 0.0750 1.1500
Covariance matrix
v1 0.0092
v2 0.0017 0.0167
> c[Eigenvalues];
[[0.00881298, 0.01702036], [[0.9782, -0.2076], [0.2076, 0.9782]]]
See also: ?Counter ?Eigenvalues ?OutsideBounds ?Stat ?SvdAnalysis
CreateArray
Function CreateArray - Creates an array of defined length and initialization
Option: builtin
Calling Sequence: CreateArray(1..n1,1..n2,1..nk)
CreateArray(1..k,z)
Parameters:
Name Type Description
---------------------------------------------------
ni integer integer dimensions of the array
k integer integer dimension of the array
z anything initialization value of the array
Returns:
list
Synopsis: This function creates a new array of dimension specified by k. If
the last argument to CreateArray is not of type range, this is the initial
value assigned to each element of the array.
Examples:
> x := CreateArray(1..5, 4);
x := [4, 4, 4, 4, 4]
> y := CreateArray(1..2, 1..2, [3,4]);
y := [[[3, 4], [3, 4]], [[3, 4], [3, 4]]]
See also: ?CreateString
CreateCodonMatrices
Function CreateCodonMatrices - creates a global list of codon mutation
matrices.
Calling Sequence: CreateCodonMatrices()
CreateCodonMatrices(setname)
CreateCodonMatrices(counts)
CreateCodonMatrices(rates,freqs)
Parameters:
Name Type Description
------------------------------------------------------------------
setname string Name of the desired set of species.
count matrix(numeric) Matrix with codon mutation counts.
rates matrix(numeric) a rate matrix Q
freqs array(nonnegative) codon frequencies
Returns:
NULL
Global Variables: AF CF CM CMS CodonLogPAM1 DM DMS logPAM1
Synopsis: When called with a set name, the precomputed logarithm of the
respective mutation matrices are loaded and used to create the global
scoring matrices. When called with no argument, the matrices are cretaed
from the data form the OMA project. Alternatively, 'mt' can be used as
setname to construct matrices for mitochondiral coding DNA. When a count
matrix is given, the mutation matrices are derived from this matrix. When a
rate matrix and the natural frequencies are given, then those are used to
create the scoring matrices. The function creates the following global
objects:
CF - a vector of length 64 containing the codon frequencies,
CodonLogPAM1 - the logarithm of a 1-CodonPAM mutation matrix,
CM - the 250-CodonPAM similarity matrix and
CMS - a list of 1266 similarity matrices.
Examples:
> CreateCodonMatrices();
> CreateCodonMatrices(hum);
See Also:
?CodonAlign ?CreateDayMatrices ?EstimateCodonPAM
?CodonDynProgStrings ?CreateSynMatrices
CreateCodonModelMatrices
Function CreateCodonModelMatrices - Creates a set of CodonPAM1 matrices
according to the M-series codon models.
Calling Sequence: CreateCodonModelMatrices(model,freq,kappa,w)
CreateCodonModelMatrices(model,freq,kappa,w,props)
CreateCodonModelMatrices(model,freq,kappa,w,props,p,q)
Parameters:
Name Type Description
-----------------------------------------------------------------------------------
model {M0,M2,M3,M8} type of substitution model
freq list(nonnegative) frequency vector
kappa nonnegative transition/transversion ratio
w {nonnegative,set(nonnegative)} dN/dS ratio(s)
props {nonnegative,list(nonnegative)} (for model <> M0) proportion(s)
p positive (for M8) p parameter of Beta distribution
q positive (for M8) q parameter of Beta distribution
Returns:
list(matrix)
Synopsis: The function CreateCodonModelMatrices creates a set of codon
substitution matrices according to the M-series codon models M0, M1/2, M3,
M7/8 by Yang.
To create matrices for M1 using M2, set w to 1; to create matrices for M7
using M8, set props to 0 and only use elements 1..10 of the list returned by
the function.
Examples:
See also: ?CreateParametricQMatrix
CreateDayMatrices
Function CreateDayMatrices - Create all the Dayhoff matrices needed
Calling Sequence: CreateDayMatrices()
CreateDayMatrices(Name)
CreateDayMatrices(Counts)
CreateDayMatrices(Q,freqs)
Parameters:
Name Type Description
-----------------------------------------------------------------------------------------------------------
Counts matrix (optional) a symmetric aa mutation count matrix
mapping procedure (optional) a mapping between symbols and posints
type = anything (optional) matrices will be of the given type
Q matrix (optional) a rate matrix
freqs array (optional) frequencies (if called with Q)
name string (optional) name of a substitution model (currently allowed are JTT, LG and WAG)
Returns:
NULL
Global Variables: AF DM DMS logPAM1
Synopsis: This function creates all the Dayhoff matrices needed for other
alignment functions to work. It performs the following four calculations:
(1) It assigns a Dayhoff matrix computed at PAM distance 250 to the global
variable DM.
(2) It computes 1266 Dayhoff matrices for various PAM distances between 0.049
and 1000 and assigns the list of such matrices to the global variable DMS.
(3) It computes the amino acid natural frequencies and assigns them to the
global variable AF.
(4) It assigns the global variable logPAM1 with the logarithm of the mutation
matrix (at PAM distance 1) being used.
By default, with no arguments, it uses the data derived from the entire
SwissProt database in Nov 1991 (Benner, Gonnet and Cohen). This can be
altered in four ways:
(a) by assigning the global variable NewLogPAM1 with the logarithm of a PAM 1
mutation matrix, all the computations will be based on this mutation matrix.
(b) by passing a count matrix as argument all the computations will be based
on this count matrix. A count matrix has the counts of mutations (and non
mutation on the diagonal) for a large sample of alignments. Normally if two
amino acids X and Y are aligned, we will add 1/2 to Counts[X,Y] and 1/2 to
Counts[Y,X].
(c) by calling the function with a rate matrix and a frequency vector, the
computations will be based on these parameters
(d) by calling the function with the name of a specific substitution model
(currently, JTT, LG and WAG are allowed). The computations will then be
based on that model.
If the counts are only on the amino acids A, C, G and T, (and the rest of the
counts are just 1 on the diagonal and 0 elsewhere), the Dayhoff matrices
produced are suitable to align DNA sequences. Actually this is the standard
and simplest way of aligning DNA sequences. The system knows about the
following count matrices, which can be used as argument of
CreateDayMatrices:
name Description
---------------------------------------------------------------------
HumanMtDNA Human mitochondrial DNA count matrix based on very short
PAM evolution, taken from 86 full mtDNA genomes
ViralRNA Counts matrix derived from 50 RNA viruses
Examples:
> CreateDayMatrices();
See Also:
?CreateCodonMatrices ?CreateOrigDayMatrix ?SearchDayMatrix
?CreateDayMatrix ?DayMatrix
CreateDayMatrix
Function CreateDayMatrix
Option: builtin
Calling Sequence: CreateDayMatrix(LogMutMatrix,PamNumber)
Parameters:
Name Type Description
---------------------------------------------------------------------------
LogMutMatrix array(array(numeric)) logarithm of a 1-PAM mutation matrix
PamNumber numeric desired PAM distance of the result
posint..posint range of integer PAM distances
Returns:
{DayMatrix,list(DayMatrix)}
Synopsis: Computes a similarity scoring matrix (usually called Dayhoff
matrix) from a the logarithm of a 1-PAM mutation matrix (LogMutMatrix) and a
PAM distance PamNumber. CreateDayMatrices() assigns the global variable
logPAM1 a logarithm of a 1-PAM mutation matrix. If the second argument is
an integer range, a list of PAM matrices with all the PAM values in the
range will be computed.
Examples:
> CreateDayMatrix( NewLogPAM1 , 250);
DayMatrix(Peptide, pam=250, Sim: max=14.152, min=-5.161, del=-19.814-1.396*(k-1))
See Also:
?CreateDayMatrices ?CreateOrigDayMatrix ?DayMatrix ?SearchDayMatrix
CreateMSAMethods
Function CreateMSAMethods( )
Creates a list of several default MSA methods
CreateOrigDayMatrix
Function CreateOrigDayMatrix
Option: builtin
Calling Sequence: CreateOrigDayMatrix(Mutations,AaCounts,PamNumber)
CreateOrigDayMatrix(mutations,counts,1..UpperPam)
Parameters:
Name Type
--------------------------------
Mutations array(numeric,20,20)
AaCounts array(numeric,20)
PamNumber numeric
Returns:
{DayMatrix,list(DayMatrix)}
Synopsis: This function computes a Dayhoff matrix (structured type DayMatrix)
computed by the method first given by Dayhoff et. all cite{DayhoffOS78}
given an observed mutation matrix mutations, a frequency vector counts and a
PAM distance PAM (or range of PAM distances beginning at 1).
Examples:
> OrigTot := [87, 41, 40, 47, 33, 38, 50, 89, 34, 37,85, 81, 15, 40, 51, 70, 58, 10, 30, 65];
OrigTot := [87, 41, 40, 47, 33, 38, 50, 89, 34, 37, 85, 81, 15, 40, 51, 70, 58, 10, 30, 65]
> OrigFreq := OrigTot/sum(OrigTot);
OrigFreq := [0.08691309, 0.04095904, 0.03996004, 0.04695305, 0.03296703, 0.03796204, 0.04995005, 0.08891109, 0.03396603, 0.03696304, 0.08491508, 0.08091908, 0.01498501, 0.03996004, 0.05094905, 0.06993007, 0.05794206, 0.00999001, 0.02997003, 0.06493506]
> OrigDM := CreateOrigDayMatrix(Mutations1978, OrigFreq, 250);
OrigDM := DayMatrix(Peptide, pam=250, Sim: max=17.302, min=-7.510, del=-19.814-1.396*(k-1))
See also: ?CreateDayMatrices ?CreateDayMatrix ?DayMatrix ?SearchDayMatrix
CreateParametricQMatrix
Function CreateParametricQMatrix - Creates a rate matrix from a frequency
vector, Ts/Tv and dN/dS.
Calling Sequence: CreateParametricQMatrix(f,k,w)
Parameters:
Name Type Description
--------------------------------------------------------
f list(nonnegative) frequency vector
k nonnegative transition/transversion ratio
w nonnegative dN/dS ratio
Returns:
matrix
Synopsis: The function CreateParametricQMatrix creates a rate matrix Q from
the frequencies and given kappa and w (omega) parameters.
Examples:
See also: ?CreateCodonModelMatrices
CreateRandMultAlign
Function CreateRandMultAlign - Random multiple alignment following a
phylogenetic tree
Calling Sequence: CreateRandMultAlign(tree,len)
CreateRandMultAlign(tree,len,method)
CreateRandMultAlign(tree,len,DelType)
Parameters:
Name Type Description
----------------------------------------------------------------------------
tree Tree Phylogenetic tree
len posint Length of root sequence
method string (optional) MSA method, default: Probabilistic
DelType {ExpGaps,ZipfGaps} (optional) mutation type, default: no gaps
Returns:
MAlignment
Synopsis: Produces a random multiple alignment that is generated from a
phylogenetic tree. The DelType is directly passed to the Mutate function,
while the method is used for the MAlign function.
Examples:
> tree := Tree(Leaf(A,-7.5000,1),0,Tree(Tree(Leaf(D,-8.5000,4),-7.5000,Leaf(C,-8.5000,3)),-4.5000,Leaf(B,-5.5000,2))):
> msa := CreateRandMultAlign(tree,200,ExpGaps);
dimensionless fitting index 0.0337
> print(msa);
Multiple sequence alignment:
----------------------------
Score of the alignment: 4498.2632
Maximum possible score: 4498.2632
A GTDQPFTNFNGINRFATPGFNPFGALLDNLSVGGVNHIAIEHSGEIEPSVRSNLVTYYVLEKKGFFPTGCVLAL
D GTDQPFTNFNGVGMFATPGFNPFGAALDNLSVGGINHIAIEHSGEIEPSVRSNLVTYYVLEKKGFFPTGCVLAL
C GTDQPFTNFNGVGMFATPGFNPFGAALDNLSVGGINHIAIEHSGEIEPSVRSNLVTYYVLEKKGFFPTGCVLAL
B GTDQPFTNFNGVGRFATPGFNPFGAALDDLSVGGVNHVAIEHSGEIEPSVRSNLVTYYVLEKKGFFPTGCVLAL
A LLDPLFLFVSPPECKVLNLFNAKTTVTDNNAPMPIMVSVGKEGADDYVFIHLSFHVPAWRAGDYRLCSSLEFTI
D LIDPLFLFVSPPECKVVNLFKAKTTVTNENAPMPIMVPVGAEGVDDYVFIHLSFHVPPWRAGDYRLCSSLEFTN
C LIDPLFLFVSPPECKVVNLFKAKTTVTNENAPMPIMVSVGAEGADDYVFIHLSFHVPPWRAGDYRLCSSLEFTN
B LIDPLFLFVSPPECKVVNLFKAKTTVTNENAPMPIMVSVGAEGADDYVFIHLSFHVPPWRAGDYRLCSSLEFTN
A FENTYWAPYIVTEIGRKRAETSANSQHGDRQSKEKGTRLMVLHTKGLTEPTA
D FENTYWAHYIVTEAGRKRAETSANSQHGDRQSKEKGTRLMVLNAKGLTEPTA
C FENTYWAHYIVTEAGRKRAETSANSQHGDRQSKEKGTRLMVLNAKGLTEPTA
B FENTYWAHYIVTEAGRKRAETSANSQHGDRQSKEKGTRLMVLNAKGLTEPTA
See Also:
?BootstrapTree ?MAlign ?Mutate
?CreateRandSeq ?MAlignment ?Tree
CreateRandPermutation
Function CreateRandPermutation
Calling Sequence: CreateRandPermutation(n)
Parameters:
Name Type
-------------
n posint
Returns:
list(integer)
Synopsis: Returns a random permutation of the integers from 1 to n.
Examples:
> CreateRandPermutation(5);
[2, 1, 5, 3, 4]
See Also:
?CreateRandSeq ?Permutation ?SetRand ?Shuffle
?Mutate ?Rand ?SetRandSeed
CreateRandSeq
Function CreateRandSeq - create a random sequence
Calling Sequence: CreateRandSeq(len,F)
Parameters:
Name Type Description
---------------------------------------------
len posint sequence length
F array(numeric) character frequencies
Returns:
string
Synopsis: Given a list of frequencies of length 4, this function creates a
random nucleotide sequence of length len. When given a list of 20 (amino
acid) frequencies, it generates a random amino acid sequence. A list of
length 64 or 65 produces a random codon sequence without stop codons.
Examples:
> CreateRandSeq(20, [0.2, 0.3, 0.4, 0.1]);
AGGCCCCCGGACAAGCGGGA
See also: ?CreateRandPermutation ?Rand ?SetRand ?SetRandSeed ?Shuffle
CreateString
Function CreateString - Creates a string of defined length and initialization
Option: builtin
Calling Sequence: CreateString(len)
CreateString(len,z)
Parameters:
Name Type Description
------------------------------------------------------------------------
len {0,posint} integer length of the string
z string initialization value of each character of the string
Returns:
string
Synopsis: Create a new string of the given length and initialize it, setting
each character to the initialization value (default: blank).
Examples:
> x := CreateString(6);
x :=
> y := CreateString(10, d);
y := dddddddddd
See also: ?CreateArray
CreateSynMatrices
Function CreateSynMatrices - Creates a global list of SynPAM matrices.
Calling Sequence: CreateSynMatrices()
CreateSynMatrices(setname)
Parameters:
Name Type Description
------------------------------------------
setname string name of predefined set.
Returns:
NULL
Global Variables: SynMS
Synopsis: When called with a set name, the precomputed count matrices are
loaded and used to create the global scoring matrices. As default, the
count matrix form the OMA project is used. The function then sets all non-
synonymous mutation counts to zero and uses this matrix to create the global
list SynMS with 1000 scoring matrices of various SynPAM distances.
Examples:
> CreateSynMatrices();
> CreateSynMatrices(mus);
See Also:
?CodonDynProgStrings ?CreateCodonMatrices ?EstimateSynPAM
?CodonMatrix ?CreateDayMatrices
CreateTreeConstruction
Function CreateTreeConstruction( type:string )
Creates a reasonalbe TreeConstruction data structure for the given type.
type may be one of the following:
prob, phylip, linear, dynamic
CreateTreeConstructions
Function CreateTreeConstructions( )
Creates a selection of tree construction algorithms.
If an optional argument specifies the number of different algorithms.
SMALL: 4 different methods
MEDIUM: 14 methods
LARGE: 40 methods
CreateTreeStatistics
Function CreateTreeStatistics( Constructions:array(TreeConstruction), Trees:array(Tree) )
Creates an array of TreeStatistics of TreeConstruction and Tree
Cumulative
Function Cumulative - compute the cumulative probability for x
Option: polymorphic
Calling Sequence: Cumulative(distr,x)
Parameters:
Name Type Description
------------------------------------------------------------
distr anything description of a probability distribution
x numeric a number
Returns:
numeric
Synopsis: This function computes the probability that a random distributed
variable with distribution "distr" has a value less or equal to x. This is
normally called the cumulative probability distribution. The result is
between 0 and 1 inclusive. The format describing the distribution is the
same as the one used by Rand. If x is continuously distributed, with
density f(x), then the cumulative is:
x
/
|
Cumulative(f, x) = | f(t) dt
|
/
-infinity
If the distribution is a discrete distribution, say over the integers, then
the cumulative is defined as:
/ x - 1 \
| ----- |
| \ |
Cumulative(f, x) = 1/2 f(x) + | ) f(t)|
| / |
| ----- |
\t = -infinity /
The system knows how to compute the Cumulative distributions of: {Binomial,
ChiSquare,LogIndepEvents,Normal,U}.
If the arguments are such that the value returned is too close to 1 or too
close to 0 for accurate representation, consider using CumulativeStd which
returns its result in equivalent standard deviations and will not suffer
from precision problems. The relations between the Cumulative(c) and the
CumulativeStd(s) are the following:
s
c = 1/2 (1 + erf(----))
1/2
2
s
c = 1 - 1/2 erfc(----)
1/2
2
s
c = 1/2 erfc(- ----)
1/2
2
References: Erdelyi53, Handbook of Mathematical functions, Abramowitz and
Stegun, 7.1
Examples:
> Cumulative( Binomial(10,0.5), 5 );
0.5000
> Cumulative( U(0,10), 7.5 );
0.7500
See Also:
?CumulativeStd ?ProbBallsBoxes ?Rand ?Std_Score
?OutsideBounds ?ProbCloseMatches ?StatTest
CumulativeStd
Function CumulativeStd - cumulative probability in standard deviations
Option: polymorphic
Calling Sequence: CumulativeStd(distr,x)
Parameters:
Name Type Description
------------------------------------------------------------
distr anything description of a probability distribution
x numeric a number
Returns:
numeric
Synopsis: This function computes the probability that a random distributed
variable with distribution "distr" has a value less or equal to x. This is
normally called the cumulative probability distribution. The result is
returned in standard deviations of an equivalent Normal(0,1) distribution.
This is useful when the result is exponentially close to 1 (or to 0) and
returning the probability would cause large truncation errors. The format
describing the distribution is the same as the one used by Rand. If x is
continuously distributed, with density f(x), then the cumulative is:
x
/
|
Cumulative(f, x) = | f(t) dt
|
/
-infinity
The system knows how to compute the Cumulative distributions of: {Binomial,
ChiSquare,LogIndepEvents,Normal,U}.
If the distribution is a discrete distribution, say over the integer, then
the cumulative is defined as:
/ x - 1 \
| ----- |
| \ |
Cumulative(f, x) = 1/2 f(x) + | ) f(t)|
| / |
| ----- |
\t = -infinity /
Examples:
> CumulativeStd( Binomial(10,0.5), 6 );
0.5995
> CumulativeStd( U(0,10), 9.75 );
1.9600
See Also:
?Cumulative ?ProbBallsBoxes ?Rand ?Std_Score
?OutsideBounds ?ProbCloseMatches ?StatTest
CurrentOff
Function CurrentOff
Option: builtin
Calling Sequence: CurrentOff()
Returns:
integer
Synopsis: Returns the current file pointer offset when reading. This runs
the C function ftell() on the current input descriptor.
See also:
DataMatrix
Data structure DataMatrix( )
Function: creates a datastructure to keep a DataMatrix.
A datamatrix can be:
- an AllAll (array of matches)
- a matrix of PAM distances or other metrics
- a matrix of Scores
The data structure keeps all three kinds of data types.
If any of them is not specified, then the field is 0.
If an AllAll is given, then both score and pam matrices are extracted automatically.
If only scores are given or PAM distances, the other two fields are 0.
Selectors:
TYPE: string, describes the type of data used
DISTANCE: array of (PAM or other positive) distances
SCORE: array of scores (or other similar measures)
The type is used for example for the calculation of TSP. If the
data has a distance flavor, then shorter distances are better.
But if the data is a score, then a higher score is better.
If no type is specified, then PAM is assumed (a distance measure)
TSP: returns optimal path in the form
[a, b, c, .. , a] (the last element is repeated)
if possible (i.e. if pam data is available) use this to calculate the TSP order
The result is saved in the data structure. So it will only compute the
best order if the field is 0, otherwise the last result is returned.
RAW: matrix
returns the original data, i.e an AllALl matrix
DATA: matrix
returns the distance or score matrix or 0 if none is there
VAR: calculates variances of PAM distances of AllAll
if no AllALl is given, it returns the data matrix
SEQ: associated sequences (optional)
Constructors:
d := DataMatrix();
d := DataMatrix("SCORE", AllAll);
d := DataMatrix("PAM", some_distance_matrix);
DayMatrix
Class DayMatrix - similarity scoring matrix or Dayhoff matrix
Template: DayMatrix(PAM)
CreateDayMatrix(logPAM1,PAM)
CreateDayMatrices()
CreateOrigDayMatrix()
Fields:
Name Type Description
-----------------------------------------------------------------------------
DelCost procedure proc(k,pam) gives cost of k-long indel
Dimension posint dimension of the similarity matrix
FixedDel numeric fixed cost (opening) for affine indels
IncDel numeric incremental cost for affine indels
logPAM1 matrix rate matrix used for this DayMatrix
Mapping procedure proc to map symbols to matrix indices
MaxOffDiag numeric maximum similarity for distinct residues
MaxSim numeric maximum similarity score in the matrix
MinSim numeric minimum similarity score in the matrix
PamDistance numeric PAM distance of this matrix
PamNumber numeric PAM distance of this matrix
Sim matrix(numeric) Similarity matrix of scores (see below)
StopSimil numeric cost of matching a stop codon
type symbol type of scoring matrix, Peptide or Nucleotide
Methods: DayMatrix_type print
Synopsis: A DayMatrix is the data structure or class which holds similarity
scores computed from mutation matrices. The matrices are used for alignment
of sequences. The scores have a precise mathematical meaning: they are 10
times the log10 of the probability that the alignment comes from homology as
opposed to a random coincidence. Hence alignment scores give a rough
estimate of how rare the alignment is if it were produced by chance only.
The functions which create DayMatrices (CreateDayMatrices) normally assign a
dense array of DayMatrix to the variable DMS (to allow estimation of
distances between sequences) and a 250-PAM matrix to DM (the most commonly
used matrix). Currently, DayMatrices are internal objects. The functions
mentioned in the Template part above are used to create DayMatrices. Some
other commonly used scoring matrices can be obtained by the command Matrices
(). When DayMatrix is used as a constructor (first entry above), it
searches the list of DayMatrix DMS for a matrix of the right PAM and returns
it. If none is found, it calls CreateDayMatrix to build an appropriate one.
When selecting the similarity matrix from a DayMatrix (selector Sim), a new
matrix is constructed and returned. If the selection on Sim is immediately
followed by two indices, then no matrix is constructed and the corresponding
entry of the Dayhoff matrix is returned. For this special case, (e.g.
DM[Sim,a,b]), the selectors a and b can be the one letter codes for the
amino acids. This is more efficient and simpler than invoking the AToInt
conversion.
Examples:
> CreateDayMatrices();
> DM[Sim,1,1];
2.3562
> DMS[100,Sim,L,I];
-17.8435
> DayMatrix(316);
DayMatrix(Peptide, pam=316, Sim: max=12.964, min=-3.989, del=-19.057-1.396*(k-1))
See Also:
?CreateDayMatrices ?CreateOrigDayMatrix ?SearchDayMatrix
?CreateDayMatrix ?Matrices
DayMatrixScale
Function DayMatrixScale
Calling Sequence: DayMatrixScale(dm)
Parameters:
Name Type
----------------
dm DayMatrix
Returns:
numeric
Synopsis: Computes the scaling factor lambda of dm such that sum (f[i]*f[j]*
exp(lambda*dm[Sim,i,j])) = 1. For Dayhoff-like matrices DM, DayMatrixScale
(DM) = ln (10) / 10.
Examples:
> DayMatrixScale( DM );
0.2303
See Also:
?CreateDayMatrices ?CreateOrigDayMatrix ?DayMatrix ?SearchDayMatrix
DbToDarwin
Function DbToDarwin - Make a darwin-readable version of SwissProt
Calling Sequence: DbToDarwin(inp,outfile,descr,TagsToKeep)
Parameters:
Name Type Description
-------------------------------------------------------------------
inp string the complete input database as a string
outfile string name of the output file (database)
descr string any commentary
TagsToKeep list(string) tags to keep from SwissProt
Returns:
NULL
Synopsis: Converts a SwissProt formatted text (inp) into a file (outfile)
usable by Darwin. This program requires a lot of main memory, (as much the
original input file). Make sure that you have enough memory by using (in
unix) "unlimit datasize memoryuse".
Once the new database is created, the first time the command "ReadDb
(SwissProt40);" is executed, the index of the database will be built.
Building the index can take quite a bit of CPU time. This time is spent
only once; future uses of the database will not require any index building.
You will find that Darwin creates a file named "SwissProt40.tree". This
index file is the Pat tree for all the peptides and is needed for most of
the basic operations of Darwin. You must have write permissions in the
directory in which the database is stored to create the tree (only the first
time the database is loaded). If an index is not needed (no fast searches
will be possible), creating an empty SwissProt40.tree file will indicate to
ReadDb that the user does not want an index.
Examples:
> DbToDarwin( ReadRawFile('sprot40.dat'), 'SwissProt40',
ReadRawFile('relnotes.txt'), ['AC','DE','OS','KW'] );
See also: ?ConsistentGenome ?DB ?GenomeSummary ?ReadDb
Denormalize
Function Denormalize
Calling Sequence: Denormalize(m)
Parameters:
Name Type
------------------
m NucPepMatch
Returns:
NucPepMatch
Synopsis: Denormalizes a match referencing a sequence being present in memory
to refer to (the complement of) an NucDB database entry.
Examples:
See also: ?Normalize
Description
Class Description - contains structured information on a function
Methods: Description_type Document error HTMLC latex select
string
Synopsis: This class contains structured information on a function suitable
to build the "description" entry for a function. This structure establishes
the "official" format for description of functions in Darwin.
Description allows an arbitrary number of parameters.
The first argument describes the function/class/variable/iterator being
described. It is a structure, where the name of the structure is one of
function/class/variable/iterator and the only field is the name of the object
being described. E.g. function(sin), structure(Stat), variable('Pi').
The Paragraph(), Indent() or Table() or any other Document valid structures
can appear at any place and will insert a paragraph/table etc. of text at that
point.
The following arguments are optional, but must be given in this order.
Summary( string )
Summary has an English short description of what the function does. It
should fit in one line together with the name of the function. It is better
if does not start with a capital letter.
CallingSequence( noeval( Func(Ver1) ),
noeval( Func(Ver2) ))
CallingSequence contains examples on how the function may be called. These
are typically surrounded by noeval() to prevent their execution. They serve
as a pattern for the one or many ways of using the object. The names of the
arguments used will be described later.
Parameters( [param1, type1, description1],
[param2, type2, description2], ...)
Parameters holds the name of the parameters, their types, and a short
description for each of them. Make sure that the 3 columns fit in the width
of a normal page (80 columns). For a data structure/class, the names
represent the fields of the structure.
Selector( [selname1, type1, description1],
[selname1, type1, description1], ... )
For data structures/classes, these provide the type and description of the
explicit and computed selectors.
Returns( type )
Returns( [type,description] )
Returns describes the value returned, when it is obvious what it is, the
type information is enough, otherwise, a description may be added.
Synopsis( string, string, ... )
Like a Paragraph, Synopsis contains the description of what the function
does/computes.
References( string, string, ... )
Provides a format for citations related to the object.
Keywords( string, string, ... )
Keywords related to this help topic.
Examples( )
Examples contains examples of how the function is used. They will appear
sequentially and with their output.
Examples( ) have five formats:
Quoted string: the statement contained in quotes is executed and the
statement and its output are printed out. If the string is terminated with a
colon (":"), then its output will not be part of the help file (like in a
Darwin session). No semicolon is needed at the end, one is added if
necessary.
Fake(commands,output): The first element is the input to Darwin, the second
element is the desired output. Nothing is evaluated (e.g. an assignment is
not executed). This is convenient when the action being described interacts
with the system (show a Plot, write a file, etc.)
Hide(command): This executes a statement but does not print out either the
input or the output. It is useful when we want to prepare for the execution
or undo some action.
Unassign(string, string, ... ): The arguments, which should be strings, are
assumed to be names that were assigned in the example and need to be
unassigned. Do not leave names assigned, as these are almost certain to cause
trouble when we generate the entire set of help files.
Print(command): The command is expected to print (which unless precautions
are taken, will end up printing in the wrong place.) This command collects the
printing output in a file and inserts it appropriately. Must be used for all
the commands which print in one way or another.
SeeAlso( token, [token,description], ... )
SeeAlso contains a series of tokens suitable for additional references (and
an optional description if necessary)
DigestAspN
Function DigestAspN - return digestion fragments from AspN
Calling Sequence: DigestAspN(seq)
Parameters:
Name Type Description
----------------------------------
seq string a protein sequence
Returns:
list(string)
Synopsis: This functions returns a set of fragment sequences of seq as though
seq were digested by AspN.
See Also:
?DigestionWeights ?DynProgMass ?ProbBallsBoxes
?DigestSeq ?DynProgMassDb ?ProbCloseMatches
?DigestTrypsin ?enzymes ?Protein
?DigestWeights ?MassProfileResults ?SearchMassDb
DigestSeq
Function DigestSeq - return digestion fragments
Calling Sequence: DigestSeq(seq,enzyme)
Parameters:
Name Type
---------------
seq string
enzyme string
Returns:
list(string)
Synopsis: Return the protein fragments that would result from a digestion
with the given enzyme.
Examples:
> DigestSeq('WWWWWWPCPLTTTTTTTTT', Armillaria );
[WWWWWWP, CPLTTTTTTTTT]
See Also:
?DigestAspN ?DynProgMass ?ProbBallsBoxes
?DigestionWeights ?DynProgMassDb ?ProbCloseMatches
?DigestTrypsin ?enzymes ?Protein
?DigestWeights ?MassProfileResults ?SearchMassDb
DigestTrypsin
Function DigestTrypsin - return digestion fragments from Trypsin
Calling Sequence: DigestTrypsin(seq)
Parameters:
Name Type Description
----------------------------------
seq string a protein sequence
Returns:
list(string)
Synopsis: This function returns a set of fragment sequences of seq as though
seq were digested by trypsin.
See Also:
?DigestAspN ?DynProgMass ?ProbBallsBoxes
?DigestionWeights ?DynProgMassDb ?ProbCloseMatches
?DigestSeq ?enzymes ?Protein
?DigestWeights ?MassProfileResults ?SearchMassDb
DigestWeights
Function DigestWeights - return weights of digestion fragments
Calling Sequence: DigestWeights(seq,enzyme)
Parameters:
Name Type Description
---------------------------------------------
seq string a protein sequence
enzyme matrix(boolean)
Returns:
list(numeric)
Synopsis: Return the weights of the protein fragments that would result from
a digestion with the given enzyme.
Examples:
> DigestWeights('WWWWWWPCPLTTTTTTTTT', Armillaria );
[1232.3950, 1241.3660]
See Also:
?DigestAspN ?DynProgMass ?ProbBallsBoxes
?DigestionWeights ?DynProgMassDb ?ProbCloseMatches
?DigestSeq ?enzymes ?Protein
?DigestTrypsin ?MassProfileResults ?SearchMassDb
DisconMinimize
Function DisconMinimize
Calling Sequence: DisconMinimize(f,iniguess,epsini,epsfinal)
Parameters:
Name Type
-------------------------
f procedure
iniguess array(numeric)
epsini numeric
epsfinal numeric
Returns:
x, f(x)
Global Variables: DisconMinimize_feval
Synopsis: Starting at iniguess, this function minimizes f until the argument
accuracy in each dimension is less than or equal to epsfinal (for
discontinuous function f).
See Also:
?BFGSMinimize ?MaxLikelihoodSize ?MinimizeBrent ?MinimizeSD
?MaximizeFunc ?Minimize2DFunc ?MinimizeFunc ?NBody
DisconnectTcp
Function DisconnectTcp
Option: builtin
Calling Sequence: DisconnectTcp()
Returns:
NULL
Synopsis: Closes the connection to the IPC daemon.
Examples:
> r := traperror(ConnectTcp('/tmp/.ipc/darwin', false));
> SendTcp('PING'); r := ReceiveTcp(3);
r := PING OK
> SendTcp('MSTAT linneus1'); r := ReceiveTcp(3);
r := DATA linneus1 0:OK ALIVE
> DisconnectTcp();;
See Also:
?ConnectTcp ?ParExecuteIPC ?ReceiveDataTcp ?SendTcp
?darwinipc ?ParExecuteSlave ?ReceiveTcp
?ipcsend ?ParExecuteTest ?SendDataTcp
DoGapHeuristic
Function DoGapHeuristic( msa:MAlignment, gaph:GapHeuristic )
Heuristics for gap alignment. The following algorithms are implemented:
a) gap fusion and shifting
b) stacking of gap blocks
c) left-right shifting of gap blocks
d) random shifting of gap blocks
Parameters:
msa: MAlignment data structure. An alignment must exist
gh: GapHeuristic data structure. See the description there.
It holds all parameteres needed for the gap heuristics, such as
the maxgaps (max. number of gaps to combine) etc.
with "gh := GapHeuristic()" default values are used.
As a third parameter the algorithm can be specified (it can also be specified
in the GapHeuristic data structure). The following values are valid:
ALL: all heuristics are used
FUSION: gap fusion and shifting is used
STACKING: gap block stacking
SHIFTING: gap block left-right shifting is used
RANDOM: random gap block shifting is used
As a FOURTH parameter a flag can be specified (it can also be specified
in the GapHeuristic data structure). The following values are valid:
NORMAL: in each round the values thay the same
INCREMENTAL: in each round the values are increased by one
RANDOM: in each round the values are changed randomly
the maximum values are the ones initially used
DocEl
Class DocEl - Adds metainformation to some content
Template: DocEl(tag,content1,...)
Returns:
DocEl
Fields:
Name Type Description
-------------------------------------------------------------
tag string the tag added to the content
content_i {string,structure} the content of the element
Methods: DocEl_type HTMLC LaTeXC string
Synopsis: DocEl is only meaningful in the context of a structured output
format such as LaTeX or (X)HTML. If used in a normal print statement, DocEl
will just output the content parameters. If used in a LaTeXC statement,
DocEl will wrap the content in a latex tag.
Examples:
> d := DocEl( 'author', 'John Doe' );
d := DocEl(author,John Doe)
> print(d);
John Doe
> prints(LaTeXC(d));
\author{John Doe}
See Also:
?Block ?HTML ?List ?RunDarwinSession
?Code ?HyperLink ?Paragraph ?screenwidth
?Color ?Indent ?PostscriptFigure ?Table
?Copyright ?LastUpdatedBy ?print ?TT
?Document ?latex ?Roman ?View
Document
AlphabeticalAlphanumericalBoldCenterCopyrightFontHyperLinkIndentITLastUpdatedByOrdinalPlusMinRomanSectionHeaderSize
Class Document - holds contents of a human-readable document
Template: Document(content1,content2,...)
Returns:
Document
Fields:
Name Type Description
-------------------------------------------------------------
content_i {string,structure} the contents of the Document
Methods: Document_type HTMLC LaTeXC print string
Synopsis: The Document structure holds text and other structures which are
expected to be laid out as a Document. When a Document is converted, each
content_i is converted to the same target. Normally a Document is converted
to a string, HTML or Latex. Besides text, the following structures are
valid inside Documents:
Name/Use Description
--------------------------------------------------------------------------
Alphabetical(int) Convert a number to alphabetical numerals
Alphanumerical(int) Convert a number to alphanumerical numerals
Bold(txt,...) Bold text
Center(txt,...) Center the contents
Code(txt,...) preformated, equally spaced text
Color(code,txt) Color the contents
Copyright(who) Insert copyright symbol, year and argument.
Font(font,txt,...) Set contents with a given font
HyperLink(txt,URL) URL linked data
Indent(txt,...) Indented data
IT(txt,...) Italic text
LastUpdatedBy(who) Convenient macro to end Document page.
List(format,txt,...) List/Definitions/bullets
MapleFormula(string) mathematical formula in Maple format
Ordinal(int) Convert a number to its ordinal ending
Paragraph(int,txt,...) A paragraph of text, lines adjusted
PlusMin(string) Expand +- to proper plus-minus symbols
PostscriptFigure(psfile,...) Figure from postscript source
Roman(int) Convert a number to roman numerals
SectionHeader(lev,txt) Section/subsection header
Size(size,txt,...) Set contents to a given size
Table(...) Tabular data
TT(txt,...) tty format (equally spaced font)
where txt means a string or any structure that will represent text.
Examples:
> d := Document( Paragraph(2,Hi), Indent(5,List(Roman,first,second)) );
d := Document(Paragraph(2,Hi),Indent(5,List(Roman,first,second)))
> print(d);
Hi
I first
II second
See Also:
?Block ?HTML ?List ?RunDarwinSession
?Code ?HyperLink ?Paragraph ?screenwidth
?Color ?Indent ?PostscriptFigure ?Table
?Copyright ?LastUpdatedBy ?print ?TT
?DocEl ?latex ?Roman ?View
DownloadURL
Function DownloadURL
Calling Sequence: DownloadURL(url,filename)
Parameters:
Name Type Description
----------------------------------------
url string a URL
filename string filename to save URL
Returns:
string
Synopsis: Downloads a URL and saves its content in a file.
See Also:
?OpenAppending ?OpenWriting ?ReadRawLine ?SearchDelim
?OpenReading ?ReadLine ?ReadURL ?SplitLines
DrawDistribution
Function DrawDistribution
Calling Sequence: DrawDistribution(sample)
Parameters:
Name Type Description
------------------------------------------------------------------
sample array([numeric, numeric]) [mean,variance] values
anything (optional) see ?PlotArguments
Returns:
NULL
Synopsis: Draws a distribution curve as a superposition of normal
distributions based on [mu,sigma^2] values. Each entry in the array sample
is interpreted as one distribution given by the pair [average,variance].
The results are usually stored in a file as with DrawPlot. They can be seen
with ViewPlot().
Examples:
> DrawDistribution( [ [0,1], [10,1], [5,10] ] ); ViewPlot();
See Also:
?BrightenColor ?DrawPointDistribution ?Set
?ColorPalette ?DrawStackedBar ?SmoothData
?DrawDotplot ?DrawTree ?StartOverlayPlot
?DrawGraph ?GetColorMap ?StopOverlayPlot
?DrawHistogram ?Plot2Gif ?ViewPlot
?DrawPlot ?PlotArguments
DrawDotplot
Function DrawDotplot
Calling Sequence: DrawDotplot(data,legend)
Parameters:
Name Type
--------------------------------------------------------------------
data {array([numeric, numeric]),list(array([numeric, numeric]))}
legend string
Returns:
NULL
Synopsis: Plots data points as dots (circle, crosses, squares, triangles).
Examples:
See Also:
?BrightenColor ?DrawPointDistribution ?Set
?ColorPalette ?DrawStackedBar ?SmoothData
?DrawDistribution ?DrawTree ?StartOverlayPlot
?DrawGraph ?GetColorMap ?StopOverlayPlot
?DrawHistogram ?Plot2Gif ?ViewPlot
?DrawPlot ?PlotArguments
DrawGraph
Function DrawGraph - draw a graph in two dimensions
Calling Sequence: DrawGraph(G)
DrawGraph(G,modif)
Parameters:
Name Type Description
-------------------------------------------------------------------------
G Graph an input Graph
modif {string,symbol = anything} (optional) modifiers for the drawing
Returns:
NULL
Global Variables: printlevel
Synopsis: DrawGraph uses the plot facility to display a Graph in two
dimensions. The first argument, G, should be a Graph data structure. The
positioning of the nodes and other properties of appearance depend on the
optional arguments:
Mode of node positioning for NBody problem:
equal
equal All the edges have an equal initial distance and variance
distance The Edges' labels correspond to distances between the adjacent
nodes. The variance of the distances are assumed to be equal to
the distance. Edges having a non-positive label are ignored for
the fitting.
weight The Edges' labels correspond to weights or scores between the
adjacent nodes. They are converted to distances by taking the
inverse of the weights. Variances are assumed to be equal to the
distance. Edges having a non-positive labels are ignored for the
fitting.
procedure A procedure Edge -> [dist, var] that assigns a distance and a
variance to an edge.
Edge drawing and labeling:
unlabeled
EdgeDrawing=unlabeled Edges are drawn without any label
EdgeDrawing=labeled The label of each edge is drawn centered on the line
and in the same color than the edge.
EdgeDrawing= A procedure (x1,y1,x2,y2,label,ts,col) -> list(
drawing commands ). x1,y1,x2,y2 are the starting and
end points of the edge, label is its label, ts the
desired textsize and col the color.
Node drawing:
Nodes are represented with a circle and the node
description
NodeDrawing= A procedure (x,y,label,ts,col) -> list( drawing commands
), where the node with label 'label' is centered at
(x,y). ts is the desired textsize and col the color.
Size of Text:
TextSize= Set point-size for all text
Nodes and edges can be colored using the optional argument, which takes a list
of arguments of the following form: Color( colorname, obj1, obj2, ... ).
The objects are either Nodes(), Edges() or Edge() data structures. This
means that those edges or nodes will be colored with colorname. The valid
names for colorname are defined in lib/Color. The output is directed
according to plotsetup.
See Also:
?BrightenColor ?DrawPointDistribution ?Set
?ColorPalette ?DrawStackedBar ?SmoothData
?DrawDistribution ?DrawTree ?StartOverlayPlot
?DrawDotplot ?GetColorMap ?StopOverlayPlot
?DrawHistogram ?Plot2Gif ?string_RGB
?DrawPlot ?PlotArguments ?ViewPlot
DrawHistogram
Function DrawHistogram - single or multiple (side by side) histogram
Calling Sequence: DrawHistogram(data,labels,legend)
Parameters:
Name Type Description
--------------------------------------------------------------------------
data array(numeric) data values, dim n (single histogram) or
data matrix(numeric) data values dim m x n (multiple histograms)
labels array (optional) dim n labels of each vertical bar(s)
legend array (optional) dim m description of each histogram
anything (optional) see ?PlotArguments
Returns:
NULL
Synopsis: DrawHistogram produces a plot of a histogram of the numerical
values given in data. That is, when "data" is a single array (dim n), hence
a single histogram, each numerical value of data is represented by a
vertical bar with its height proportional to its data value. The data
values are printed at the top of each bar. When "data" is a matrix (dim m x
n), this means that m values will be be plotted together; this will be done
with proportional vertical bars, side by side. To have the m values stacked
on top of each other (instead of side by side), use ?DrawStackedBar. The
results of DrawHistogram are placed in a file, following the same
conventions as DrawPlot. The plot can be seen with ViewPlot().
Examples:
> DrawHistogram( [1,2,3,4,3,2,1] ); ViewPlot();
> DrawHistogram( [ [ 38, 180, 42 ], [ 42, 40, 48] ],
[ 'politicians', 'darwin users', 'boxers'],
[ 'IQ', 'shoe size' ] );
> ViewPlot();
See Also:
?BrightenColor ?DrawPointDistribution ?Set
?ColorPalette ?DrawStackedBar ?SmoothData
?DrawDistribution ?DrawTree ?StartOverlayPlot
?DrawDotplot ?GetColorMap ?StopOverlayPlot
?DrawGraph ?Plot2Gif ?ViewPlot
?DrawPlot ?PlotArguments
DrawPlot
Function DrawPlot - produce a plot/drawing in a file
Option: builtin
Calling Sequence: DrawPlot(p,lo..hi)
DrawPlot(numlist)
DrawPlot(pairlist)
DrawPlot(objlist)
DrawPlot(plotset)
DrawPlot(plotset,lo..hi)
Parameters:
Name Type Description
---------------------------------------------------------------------------
p procedure a numerical procedure
lo..hi numeric..numeric numerical range to plot
numlist list(numeric) a list of values joined by lines, the
values are interpreted as coordinates
(i,y[i])
pairlist list([numeric, numeric]) a list of pairs (x[i],y[i]) joined
by straight lines
objlist list(object) a list of objects (described below)
plotset set(plots) a list of any of the above plots
The format of the objects in objlist is:
-----------------------------------------------------------------------
left aligned text LTEXT(x,y,string,points,angle,color)
centered text CTEXT(x,y,string,points,angle,color)
right aligned text RTEXT(x,y,string,points,angle,color)
line LINE(x1,y1,x2,y2,color,width)
closed polygon POLYGON(x1,y1,x2,y2,..., fill,color,width)
circle CIRCLE(x,y,radius,fill,color,width)
-----------------------------------------------------------------------
x,x1,x2,y,y1,.. numeric values of coordinates
points points=, size in points of the text
angle angle=, angle of the text in degrees
color color=[r,g,b], values of red/green/blue within 0..1
(color is incompatible with fill)
fill fill=, fill in color (0-black, 1-white)
(fill is incompatible with color)
width width=, width of lines in points
Returns:
NULL
Synopsis: Plot a set of objects creating PostScript output which is stored in
a file. The name of the file can be set using Set(plotoutput). By default
it is "temp.ps". It is assumed that all the objects being drawn are on the
same x and y coordinates, that is all the x and y values are on the same
units. Optional arguments are:
keyword description
------------------------------------------------------------------
proportional causes identical scaling for x and y axis
axis forces x and y axes to be drawn
grid forces a grid of lines to be drawn
topmargin=xx xx (user) units of space are added at the top
botmargin=xx xx (user) units of space are added at the bottom
leftmargin=xx xx (user) units of space are added as left margin
rightmargin=xx xx (user) units of space are added as right margin
See Also:
?BrightenColor ?DrawPointDistribution ?Set
?ColorPalette ?DrawStackedBar ?SmoothData
?DrawDistribution ?DrawTree ?StartOverlayPlot
?DrawDotplot ?GetColorMap ?StopOverlayPlot
?DrawGraph ?Plot2Gif ?ViewPlot
?DrawHistogram ?PlotArguments
DrawPointDistribution
Function DrawPointDistribution - histogram of point distribution
Calling Sequence: DrawPointDistribution(data,Bars)
Parameters:
Name Type Description
-------------------------------------------------------------------
data array(numeric) data values, not necessarily ordered
Bars posint (optional) number of ranges, histogram bars
anything (optional) see ?PlotArguments
Returns:
NULL
Synopsis: DrawPointDistribution produces a plot of a histogram of the
distribution of the given data points. The data values are sorted and
classified in a number of equally spaced ranges. For each range a histogram
(vertical bar) with the number of points in that range is drawn. This
produces a discrete approximation of the density distribution of the points
in data. The data values do not need to be in order. The number of
vertical bars is automatically computed or it can be set with an optional
second argument. The results of DrawPointDistribution are placed in a file,
following the same conventions as DrawPlot. The plot can be seen with
ViewPlot().
Examples:
> DrawPointDistribution( [seq(Rand(Normal),i=1..500)] ); ViewPlot();
See Also:
?BrightenColor ?DrawHistogram ?Plot2Gif ?StopOverlayPlot
?ColorPalette ?DrawPlot ?PlotArguments ?ViewPlot
?DrawDistribution ?DrawStackedBar ?Set
?DrawDotplot ?DrawTree ?SmoothData
?DrawGraph ?GetColorMap ?StartOverlayPlot
DrawSplitGraph
Function DrawSplitGraph( g:Graph, angles:array(numeric), title:string )
Draws graph g with edge e at angle angles[e[1,2]].
DrawSplits
Function DrawSplits( splits:list([numeric, set]), all:{posint,set} )
Draws a graph from a list of dSplits. all is the set of all taxa of the
split or a posint if the set is 1..all.
DrawStackedBar
Function DrawStackedBar - histogram with multiple values on each bar
Calling Sequence: DrawStackedBar(data,labels,legend)
Parameters:
Name Type Description
------------------------------------------------------------------------
data matrix(numeric) data values dim m x n
labels array (optional) dim n, labels of each vertical bar
legend array (optional) dim m, description of each stack
anything (optional) see ?PlotArguments
Returns:
NULL
Synopsis: DrawStackedBar produces a histogram of the numerical values given
in data. Each vertical bar is composed of several segments, corresponding
to the m lists of values, stacked on top of each other. The data values are
printed inside each stacked segment of the bars. To have the m values side
by side (instead of stacked), use ?DrawHistogram. The results of
DrawStackedBar are placed in a file, following the same conventions as
DrawPlot. The plot can be seen with ViewPlot().
Examples:
> DrawStackedBar( [ [ 38, 180, 42 ], [ 42, 40, 48] ],
[ 'politicians', 'darwin users', 'boxers'],
[ 'IQ', 'shoe size' ] );
> ViewPlot();
See Also:
?BrightenColor ?DrawPlot ?Set
?ColorPalette ?DrawPointDistribution ?SmoothData
?DrawDistribution ?DrawTree ?StartOverlayPlot
?DrawDotplot ?GetColorMap ?StopOverlayPlot
?DrawGraph ?Plot2Gif ?ViewPlot
?DrawHistogram ?PlotArguments
DrawTree
Function DrawTree - general front-end for drawing phylogenetic trees
Calling Sequence: DrawTree(tree,method,modif)
Parameters:
Name Type Description
---------------------------------------------------------------------------
tree Tree input tree to draw
method string (optional) method to display the tree
modif {string,symbol = anything} optional modifiers for the drawing
Returns:
NULL
Synopsis: DrawTree draws a phylogenetic tree and produces a file containing
postscripts commands. This is a single interface for all the methods and
variants that we could imagine for drawing phylogenetic trees. The tree
must contain length information in its nodes, as it is the common case for
the functions which build the trees. The behaviour is classified according
to the following phases:
Mode of tree display:
Vertical
Vertical horizontally equally spaced leaves, vertical height preserved
Unrooted planar representation, root is only identified by a small
circle, branch distances are preserved. Also called Splat trees
Radial leaves are on equally spaced directions from the root, distances
to the root preserved
RadialLines like Radial, with arcs indicating distances
Phylogram left to right horizontal branches, branch lengths preserved
Cladogram left to right horizontal branches, branches to leaves stretched
to align right
Bisect like Radial, but parent is on bisector line
BisectLines like Bisect, with arcs indicating distances
ArcRadial a Cladogram drawn with polar coordinates
Reordering of leaves:
use the ordering in the Tree
OrderLeaves= permute the left-right subtrees to make the clusters
as contiguous as possible
OrderLeaves=LeftHeavy permute the left-right subtrees to make the left
subtrees the largest
OrderLeaves=Random randomly permute the left-right subtrees to (possibly)
obtain better looking trees
Branch labelling:
Adaptive, 2-digit precision, branch labelling
LengthFormat= A string which is interpreted as a format of an
sprintf call with the length of the branch. If set to
the empty string, no branch labelling will happen.
LengthFormat= A procedure: (Length) -> string which takes the branch
length as an argument and produces the string to be
placed on the branch.
BranchDrawing= A procedure that will do all the branch drawing. (x1,
y1, x2,y2, l) -> list( drawing commands). The branch
spans from (x1,y1) to (x2,y2) and has a branch length
l. Use ShowBootstrap to display boostrapping values
on the branches.
Internal Nodes:
no labelling happens for internal nodes
InternalNodes= A procedure (Tree,x,y) -> list( drawing commands) which
will be invoked every time that an internal node
(identified by Tree), is drawn at position (x,y).
ShowBootstrap would display the bootstrapping values for
internal nodes if they are present in the fourth field
of the Tree data structure.
Leaf display information:
circle with leaf[Label] written. If the Leaf contains
additional arguments of the form: Shape = sss or Color =
ccc, then the Leaf is displayed using the shape sss and
color ccc. Alternatively, if the Label is the structure
Color(colorcode,xxx), then xxx will be taken as the Label
and will be colored with the given colorcode.
Legend leaf[Label] written (no circle)
LeafDrawing= A procedure (Leaf,x,y) -> list( drawing commands) to
display the Leaf centered at (x,y).
Clusters= color and shape according to cluster
RadialLabels leaf labels radial
Cross referencing:
no cross referencing, all labelling is done with leaf[Label]
CrossReference all labelling is done with an alphanumeric character and
leaf[Label] is cross referenced on the right
Title:
Title=anything Title to appear centered at the bottom
Size of Text:
TextSize= Set point-size for all text
Minimum branch length:
MinBranchLength=positive Force all branches to be of a minimum length. The
labelling will be done with the original lengths, but the
drawing will use this minimum value. This is a useful option
when part of the tree is cramped together and difficult to
see. The proportions will not be maintained, but the tree
can be understood. It is recommended to display the edge
lengths if this option is used.
list of drawing commands:
CTEXT(...) Centered text (as for DrawPlot)
LTEXT(...) Left aligned text (as for DrawPlot)
RTEXT(...) Right aligned text (as for DrawPlot)
LINE(...) Line (as for DrawPlot)
POLYGON(...) Closed polygon (as for DrawPlot)
CIRCLE(...) Circle (as for DrawPlot)
In all cases, provides the definition of the clusters, or groups of
leaves. This can be done as:
list(anything) the numbering in the leaves is used as an index in this list,
and the value is the cluster name. Clustering will be done on
equal values.
procedure as above, but the value is obtained by running the procedure
on the Leaf.
Drawing of lateral gene transfer (LGT) arrows: in the ArcRadial tree display,
arrows can be drawn to depict LGTs. Each LGT must is characterized by its
two endpoints, defined in a list placed in the 4th field of the relevant
Tree() structure (or the 3rd field of a Leaf() structure), as follows: [
'unique id', {'start','end'}, height, (optionally, an RGB color triplet)].
A list of drawing commands is composed of the objects (as defined in
?DrawPlot) LTEXT, CTEXT, RTEXT, LINE, POLYGON and CIRCLE.
See Also:
?BootstrapTree ?Leaf ?SignedSynteny ?Tree
?DrawPlot ?LeastSquaresTree ?Synteny ?ViewPlot
?GapTree ?PhylogeneticTree ?SystemCommand
DynProgGap
Function DynProgGap( seq1:string, seq2:string )
Does dynamic programming between the two sequences, but the sequences
may have gaps. Gap against gap is scored 0. Implementaion of Gotohs algorithm.
An additional optional parameter
window: integer
can be passed.
If window > 0, the pam variance along the sequence is estimated by
sliding a window along a match and for each stretch the best
pam distance is calculated. For this "normal" dynamic programming
without alignment of gaps is used. For both sequences a list of
pam distances is used.
Then the dynamic programming is repeated, but this time using a different
Dayhoff matrix at each position of the match that was determined.
If there is a deletion in seq1, pam1 is used and vice versa.
If there is a match, both pam distances is used and score. Then
the distance (and score) with the better score is used.
DynProgMass
Function DynProgMass - matches digestion fragments with a sequence
Option: builtin
Calling Sequence: DynProgMass(p,seq,stddev,deb)
Parameters:
Name Type Description
----------------------------------------
p {array,structure}
seq string
stddev numeric
deb numeric
Returns:
NULL
Synopsis: Matches a Carboxypeptidase A digest (Fragment) with a sequence
using dynamic programming.
Data structure of fragment:
[[-2.0023, P]], [[-1.2703, GV],[-1.2703, VG],[-0.9824, R]], ....
[[-1.8961, T]], [0, 104.0941]]
Examples:
See Also:
?DigestAspN ?DigestWeights ?ProbBallsBoxes
?DigestionWeights ?DynProgMassDb ?ProbCloseMatches
?DigestSeq ?enzymes ?Protein
?DigestTrypsin ?MassProfileResults ?SearchMassDb
DynProgMassDb
Function DynProgMassDb - matches digestion fragments with a database
Option: builtin
Calling Sequence: DynProgMassDb(p,m,term,df,stddev,ddb)
Parameters:
Name Type Description
-------------------------------
p array
m integer
term string
df database
stddev numeric
ddb numeric
Returns:
NULL
Synopsis: Matches a Carboxypeptidase A digest (Fragment) against the whole
database
Examples:
See Also:
?DigestAspN ?DigestWeights ?ProbBallsBoxes
?DigestionWeights ?DynProgMass ?ProbCloseMatches
?DigestSeq ?enzymes ?Protein
?DigestTrypsin ?MassProfileResults ?SearchMassDb
DynProgNucPepString
Function DynProgNucPepString
Option: builtin
Calling Sequence: DynProgNucPepString(npm)
Parameters:
Name Type
------------------
npm NucPepMatch
Returns:
NULL
Synopsis: Return two texts defining the alignment of NucPepMatch suitable to
print it. npm[NucGaps], npm[PepGaps] and npm[Introns] must be defined.
Examples:
See Also:
?AlignNucPepAll ?GetPosition ?NucPepDynProg
?AlignNucPepMatch ?GlobalNucPepAlign ?NucPepMatch
?Denormalize ?Intron ?NucPepRegions
?FindNucPepPam ?LocalNucPepAlign ?ParallelAllNucPepMatches
?Gene ?LocalNucPepAlignBestPam ?PepDB
?GetAllNucPepMatches ?Normalize ?ScoreIntron
?GetIntrons ?NucDB ?VisualizeGene
?GetPeptides ?NucPepBackDynProg ?VisualizeProtein
DynProgScore
Function DynProgScore - compute the forward phase of sequence alignment
Option: builtin
Calling Sequence: DynProgScore(seq1,seq2,dm,modif)
Parameters:
Name Type Description
-----------------------------------------------------------------------------
seq1 {ProbSeq,string} first sequence to be aligned
seq2 {ProbSeq,string} second sequence to be aligned
dm {DayMatrix,list(DayMatrix)} Dayhoff matrix to use for the alignment
modif {string,set(string)} specification of alignment
Returns:
{[Score:numeric],[score:numeric, from1..to1, from2..to2]}
Synopsis: Computes the optimal cost of the alignment between seq1 and seq2
using the Dayhoff matrix dm, a specified alignment mode and a specified
deletion cost model. It returns a triplet: [ Score, from1..to1, from2..to2
] or [ Score ] where Score is the optimal score of the alignment and
seq1[from1..to1] and seq2[from2..to2] are the selected portions of the
sequences to align. seq1 and seq2 can be either peptide sequences,
nucleotide sequences or probabilistic sequences, ProbSeq(). Modif is a set
of strings which have the following meanings:
For the alignment type, one of the following can be specified:
Local - (default) a local alignment, the subsequences of seq1 and seq2
which give the highest score.
Global - a global alignment, the entire seq1 is matched against the
entire seq2.
CFE - cost-free ends, the entire seq1 is matched against seq2, but one
deletion at the ends is not penalized.
CFEright - cost-free ends, the entire seq1 is matched against seq2, but
one deletion at the right end is not penalized.
Shake - align seq1 and seq2 up to the point where the maximum score
happens. Then do the same backwards and forwards until no
improvements of the score happen.
MinLength(k) - align seq1 and seq2 as in a Local alignment (starting
anyplace, ending anyplace) but at least k amino acids of each
sequence are aligned. I.e. the minimum of the aligned lengths
is k or larger.
For the deletion cost model, one of the following can be specified:
Affine - (default) deletion cost of a gap of length k is FixedDel +
IncDel*(k-1). The values for FixedDel and IncDel are taken
from the Dayhoff matrix dm.
LogDel - logarithmic deletion cost, the cost of a gap of length k is
DelFixedLog + DelLog*log(k). The values for DelFixedLog and
DelLog are taken from the Dayhoff matrix dm.
For the type of result, any combination of the following can be specified:
JustScore - only the score is computed, and the locations of the match
are not returned (this makes the algorithm run faster for Local
and CFE).
NoSelf - compute an alignment where matches of a position with itself are
disallowed. This is relevant when aligning a sequence with
itself with the purpose of discovering repeated motifs.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
> DynProgScore(AC(P00083),AC(P00091),DM,Local);
[177.7799, 14..92, 19..97]
> DynProgScore(AC(P00083),AC(P00091),DM,Global);
[144.4751, 1..127, 1..139]
> DynProgScore(AC(P00083),AC(P00091),DM,{CFE,JustScore});
[174.0188]
> DynProgScore('ADEFGHIKSDEFGHLK','ADEFGHIKSDEFGHLK',DM,NoSelf);
[75.0720, 1..16, 1..16]
See also: ?Align ?Alignment ?CreateDayMatrices ?MAlign
DynProgStrings
Function DynProgStrings - compute score and aligned strings from a Match
Option: builtin
Calling Sequence: DynProgStrings(m,dm)
DynProgStrings(m,dm,NoSelf)
DynProgStrings(al)
Parameters:
Name Type Description
-------------------------------------------------------------------
m Match input Match
dm DayMatrix scoring matrix
NoSelf string (optional), no self alignments will be allowed
al Alignment input Alignment object
Returns:
[numeric, string, string] : [score,seq1,seq2]
Synopsis: Returns a list with the similarity score, first sequence and second
sequence suitable for printing the given match with the given similarity
matrix. The sequences are the original sequences from the match with
inserted '_' as needed to produce the desired alignment. If a third
argument is provided, it must be the keyword 'NoSelf'. This is an
indication that no position will be aligned with itself, a situation useful
for the detection of repetitious patterns. If an Alignment is provided, all
the information is contained in the object, and no additional arguments are
needed.
Examples:
> al := Align('ADEFGHIKLMNNW','ADEFGKLMNNW');
al := Alignment('ADEFGHIKLMNNW','ADEFGKLMNNW',36.4025,DM,0,0,{Local})
> DynProgStrings(Match(al),DM);
[36.4025, ADEFGHIKLMNNW, ADEFG__KLMNNW]
> seq1 := 'ADEFGHIKSDEFGHLK';
seq1 := ADEFGHIKSDEFGHLK
> al := Align(seq1,seq1,NoSelf);
al := Alignment('ADEFGHIK','SDEFGHLK',35.0800,DM,0,0,{Local,NoSelf})
> DynProgStrings(Match(al),DM,NoSelf);
[35.0800, ADEFGHIK, SDEFGHLK]
> DynProgStrings(al);
[35.0800, ADEFGHIK, SDEFGHLK]
See also: ?Align ?CodonDynProgStrings ?Match ?print
Edge
Class Edge - edge/arc description
Template: Edge(Label,From,To)
Returns:
Edge
Fields:
Name Type Description
----------------------------------------------------
Label anything the label of the edge
From anything the first end point of the edge.
To anything the second end point of the edge.
Methods: Edge_type select
Synopsis: The Edge data structure stores the information associated with an
edge. Some algorithms assume that the Label field stores a numeric value
representing a weight. The Edges are always directed, but if the graph is
meant to be undirected, then the From/To are exchangeable and only one entry
per Edge is needed.
Examples:
> G := Graph( Edges( Edge(4,1,2), Edge(7,1,3), Edge(6,2,4),
Edge(5,3,4) ), Nodes(1, 2, 3, 4) );
G := Graph(Edges(Edge(4,1,2),Edge(7,1,3),Edge(6,2,4),Edge(5,3,4)),Nodes(1,2,3,4))
> G[Edges, 1, Label];
4
See Also:
?BipartiteGraph ?Graph_Rand ?ParseDimacsGraph
?Clique ?Graph_XGMML ?Path
?DrawGraph ?InduceGraph ?RegularGraph
?EdgeComplement ?MaxCut ?ShortestPath
?Edges ?MaxEdgeWeightClique ?TetrahedronGraph
?FindConnectedComponents ?MinCut ?VertexCover
?Graph ?MST
?Graph_minus ?Nodes
EdgeComplement
Function EdgeComplement - construct the graph on the complementary edges
Calling Sequence: EdgeComplement(Graph)
Parameters:
Name Type Description
------------------------------
Graph Graph an input graph
Returns:
Graph
Synopsis: Computes the complement graph of the input. This is a graph over
the same set of nodes, but with edges where there were no edges and vice-
versa. The labels of the old edges are lost and the new edges are assigned
a 0 label.
Examples:
> hex := HexahedronGraph();
hex := Graph(Edges(Edge(0,1,2),Edge(0,1,4),Edge(0,1,5),Edge(0,2,3),Edge(0,2,6),Edge(0,3,4),Edge(0,3,7),Edge(0,4,8),Edge(0,5,6),Edge(0,5,8),Edge(0,6,7),Edge(0,7,8)),Nodes(1,2,3,4,5,6,7,8))
> EdgeComplement(hex);
Graph(Edges(Edge(0,1,3),Edge(0,1,6),Edge(0,1,7),Edge(0,1,8),Edge(0,2,4),Edge(0,2,5),Edge(0,2,7),Edge(0,2,8),Edge(0,3,5),Edge(0,3,6),Edge(0,3,8),Edge(0,4,5),Edge(0,4,6),Edge(0,4,7),Edge(0,5,7),Edge(0,6,8)),Nodes(1,2,3,4,5,6,7,8))
See Also:
?BipartiteGraph ?Graph_Rand ?ParseDimacsGraph
?Clique ?Graph_XGMML ?Path
?DrawGraph ?InduceGraph ?RegularGraph
?Edge ?MaxCut ?ShortestPath
?Edges ?MaxEdgeWeightClique ?TetrahedronGraph
?FindConnectedComponents ?MinCut ?VertexCover
?Graph ?MST
?Graph_minus ?Nodes
Eigenvalues
Function Eigenvalues - Eigenvalue/vector decomposition of a symmetric matrix
Option: builtin
Calling Sequence: Eigenvalues(A,eigenvects)
Parameters:
Name Type Description
---------------------------------------------
A matrix a symmetric matrix
eigenvects name an optional matrix name
Returns:
list(numeric)
Synopsis: Compute an eigenvalue/eigenvector decomposition of A. A must be a
symmetric matrix. The function returns the vector containing the
eigenvalues in increasing order. The optional second argument, if present
must be a name that will be assigned with the matrix of the eigenvectors.
The eigenvectors have norm 1 and are stored columnwise and the ith column
corresponds to the ith eigenvalue.
Examples:
> A := [[3,1,2],[1,2,-1],[2,-1,5]];
A := [[3, 1, 2], [1, 2, -1], [2, -1, 5]]
> alpha := Eigenvalues(A,V);
alpha := [0.4921, 3.2444, 6.2635]
> Vt := V^t;
Vt := [[0.6041, -0.6782, -0.4185], [0.6191, 0.7301, -0.2894], [0.5018, -0.08423029, 0.8609]]
> A*Vt[1] = alpha[1]*Vt[1];
[0.2973, -0.3337, -0.2059] = [0.2973, -0.3337, -0.2059]
> Vt[2]*Vt[2];
1.0000
See Also:
?Cholesky ?GivensElim ?matrix ?transpose
?convolve ?Identity ?matrix_inverse
?GaussElim ?LinearProgramming ?SvdAnalysis
EnterProfile
Function EnterProfile
Option: builtin
Calling Sequence: EnterProfile(blockname)
Parameters:
Name Type
------------------
blockname string
Returns:
NULL
Synopsis: This function is used to identify the beginning of a block to be
profiled. An EnterProfile should always be matched to an ExitProfile which
should be at the same level (in the same statement sequence) and should have
the same blockname. The code surrounded by the EnterProfile and ExitProfile
will be profiled under the name given by blockname. Many pairs of Enter/
ExitProfile may be used, with or without the same blockname. Run time
statistics will be grouped by blockname. Enter/ExitProfile pairs cannot be
nested within the same statement sequence.
Examples:
> EnterProfile(longloop);
> s:=0: for i to 10^5 do s := s+1/i od;
12.0901
> ExitProfile(longloop);
See also: ?ExitProfile ?profiling
Entry
Function Entry - return entries from the database DB
Option: polymorphic
Calling Sequence: Entry(a)
Parameters:
Name Type Description
-------------------------------------------------------------------------------------------------
a {integer,string,structure,list(integer)} Entry number(s) or other description of entries
Returns:
{expseq(string),string}
Synopsis: Entry returns the string(s) corresponding to the entries in the
database DB described. This can be through entry numbers, PatEntry, Match,
ID, AC or partial references to entries.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
> e1 := Entry(1);
e1 := 104K_THEPAP15711;104 kDa ..(1255).. L
> Entry(PatEntry(10000..10001));
SYP1_YEASTP25623; P25622; Q96VH0;< ..(1372).. A, SYP_CHLPNQ9Z851; Q9JSE4;P ..(1517).. A
> Entry(AC('P11341'));
VG9_SPV4P11341;Gene 9 pro ..(266).. R
> Entry(ID('ID5B_PROJU'));
ID5B_PROJUP32734;Kunitz-t ..(678).. G
> s1 := Sequence(e1);
s1 := MKFLILLFNILCLFPVLAADNHGVGPQGASGVDPITFDINSNQTGPAFLT ..(924).. ILVVSLIVGIL
> Entry(e1);
104K_THEPAP15711;104 kDa ..(1255).. L
> GetEntryNumber(e1);
1
See Also:
?AC ?GetEntryNumber ?Match ?SearchTag
?GetEntryInfo ?ID ?PatEntry ?Sequence
EstimateCodonPAM
Function EstimateCodonPAM - finds the best-scoring CodonPAM matrix for an
alignment
Calling Sequence: EstimateCodonPAM(dps1,dps2,cms)
Parameters:
Name Type Description
---------------------------------------------------------
dps1 string First of the aligned sequences.
dps2 string Second of the aligned sequences.
cms list(DayMatrix) array of codon scoring matrices
Returns:
[Score, CodonPAM, CodonPAMVariance]
Synopsis: Given two codon-wise aligned DNA sequences, this functions finds
the best-scoring CodonPAM matrix. Anaologous to EstimatePam, it returns a
list containing the score, the CodonPAM estimate and the CodonPAM variance.
Examples:
> EstimateCodonPAM(AAACCCGGGTTT,AAACCG___TTT,CMS);
[11.0814, 91, 1653.3145]
See Also:
?CodonAlign ?CreateCodonMatrices ?EstimateSynPAM
?CodonDynProgStrings ?EstimatePam
EstimateNG86
Function EstimateNG86
Calling Sequence: EstimateNG86(seq1,seq2)
Parameters:
Name Type Description
------------------------------------
seq1 string aligned DNA sequence
seq2 string aligned DNA sequence
Returns:
array(numeric)
Synopsis: Computes dN and dS following the method by Nei and Gojobori (1986).
The function returns four values, dN and dS as well as the number of
nonsynonymous (N) and synonymous sites (S). If either dN or dS cannot be
computed (typically because of too much divergence), -1 is returned for the
respective value.
Examples:
> EstimateNG86(AAAAAATTT,AAAAAGTTA);
[0.1435, 3.5455, 7.6548, 1.3452]
See also: ?CodonAlign ?EstimatePB93 ?EstimateSynPAM
EstimatePB93
Function EstimatePB93
Calling Sequence: EstimatePB93(seq1,seq2)
Parameters:
Name Type Description
------------------------------------
seq1 string aligned DNA sequence
seq2 string aligned DNA sequence
Returns:
array(numeric)
Synopsis: Computes Ka and Ks following the method by Pamilo and Bianchi
(1993). The function returns a list [Ka,Ks] with the two estimates. If
these values cannot be computed (typically because of too much divergence),
then [-1,-1] is returned.
Examples:
> EstimatePB93(AAAAAATTT,AAAAAGTTA);
[0.1648, 0.7611]
See also: ?CodonAlign ?EstimateNG86 ?EstimateSynPAM
EstimatePam
Function EstimatePam
Option: builtin
Calling Sequence: EstimatePam(s1,s2,days)
Parameters:
Name Type
-----------------------
s1 string
s2 string
days array(DayMatrix)
Returns:
[Score,PamDistance,PamVariance]
Synopsis: Calculates the similarity score, Pam distance and Pam variance for
the alignment defined by s1 and s2. Notice that s1 and s2 are taken as
aligned already, that is, they are not re-aligned. If s1 and s2 need to be
aligned, use DynProgStrings first. The estimation of the Pam distance and
variance is normally done by Align when given an array of Dayhoff matrices.
If the the estimated distance is lower than 0.1 pam, the estimate is also
computed by expected values (the computation of distances by maximum
likelihood becomes less accurate). This second estimate is stored in the
global variable ExpectedPamDistance. The computation of the PamDistance by
maximum likelihood (exactly, not just for an existing DM in days) is stored
in the global variable MLPamDistance.
Examples:
> EstimatePam('CITKLFDGDQVLY', Mutate('CITKLFDGDQVLY', 100), DMS);
[73.1848, 61, 822.4780]
See Also:
?Align ?DynProgStrings ?EstimateSynPAM
?CalculateScore ?EstimateCodonPAM
EstimateSynPAM
Function EstimateSynPAM - finds the best-scoring SynPAM matrix for an
alignment
Calling Sequence: EstimateSynPAM(dps1,dps2)
Parameters:
Name Type Description
------------------------------------------------
dps1 string First of the aligned sequences.
dps2 string Second of the aligned sequences.
Returns:
[Score, SynPAM, SynPAMVariance]
Synopsis: Given two codon-wise aligned DNA sequences, this functions finds
the best-scoring SynPAM matrix. Anaologous to EstimatePam, it returns a list
containing the score, the SynPAM estimate and the SynPAM variance.
Examples:
> EstimateSynPAM(AAACCCGGGTTT,AAACCG___TTT);
[2.3328, 51.9870, 942.8518]
See Also:
?CodonAlign ?CreateSynMatrices ?EstimatePam
?CodonDynProgStrings ?EstimateCodonPAM ?EstimatePB93
?CodonMatrix ?EstimateNG86
EvolTree
Data structure EvolTree( )
Function: creates a EvolTree data structure
Selectors:
Tree: Tree
TC: TreeConstruction type (how was the tree constructed)
Data: DataMatrix
Index: Tree fitting index
PAM: Total pam length of tree
Score: Score of tree
Order: TSP order
Other selectors not contained directly in data structure
n: number of leaves
leaves: returns a list of leafnames of tree
Constructors:
EvolTree(Tree)
EvolTree(Tree, TC)
EvolTree(Tree, TC, Data, Index, PAM, Score, Order)
ExitProfile
Function ExitProfile
Option: builtin
Calling Sequence: ExitProfile(blockname)
Parameters:
Name Type
------------------
blockname string
Returns:
NULL
Synopsis: This function is used to identify the ending of a block to be
profiled. An ExitProfile should always be matched to a previous
EnterProfile which should be at the same level (in the same statement
sequence) and should have the same blockname. The code surrounded by the
EnterProfile and ExitProfile will be profiled under the name given by
blockname. Many pairs of Enter/ExitProfile may be used, with or without the
same blockname. Run time statistics will be grouped by blockname. Enter/
ExitProfile pairs cannot be nested within the same statement sequence.
Examples:
> EnterProfile(longloop);
> s:=0: for i to 10^5 do s := s+1/i od;
12.0901
> ExitProfile(longloop);
See also: ?EnterProfile ?profiling
ExpFit
Function ExpFit - Least squares exponential fit: y[i] ~ a + b * exp(c*x[i])
Calling Sequence: ExpFit(y,x)
Parameters:
Name Type Description
--------------------------------------------
x array(numeric) dependent variable
y array(numeric) independent variable
Returns:
[a,b,c,sumsq]
Synopsis: Compute a least squares fit of the type:
y[i] ~ a + b * exp(c*x[i])
where a,b and c are the parameters of the approximation and sumsq is the sum
of the squares of the errors of the approximation.
Examples:
> x := [1,2,3,4,5];
x := [1, 2, 3, 4, 5]
> y := [0.49, 1.02, 2.1, 4.01, 7.8];
y := [0.4900, 1.0200, 2.1000, 4.0100, 7.8000]
> ExpFit(y,x);
[-0.1014, 0.3113, 0.6467, 0.00204305]
See also: ?ExpFit2 ?LinearRegression ?Stat
ExpFit2
Function ExpFit2 - Least squares exponential fit: y[i] ~ a * exp(b*x[i])
Calling Sequence: ExpFit2(y,x)
Parameters:
Name Type Description
--------------------------------------------
y array(numeric) dependent variable
x array(numeric) independent variable
Returns:
[a, b, sumsq]
Synopsis: Compute the least squares fit of the type y[i] ~ a * exp(b * x[i]).
sumsq is the sum of squares of the approximation errors.
Examples:
> x := [1,2,3,4,5];
x := [1, 2, 3, 4, 5]
> y := [0.49, 1.02, 2.1, 4.01, 7.8];
y := [0.4900, 1.0200, 2.1000, 4.0100, 7.8000]
> ExpFit2(y,x);
[0.2771, 0.6677, 0.00586200]
See also: ?ExpFit ?LinearRegression ?Stat
ExpandFileName
Function ExpandFileName( dir:string, name:string )
Generate file name from directory and name.
Exponential_Rand
Function Exponential_Rand - Generate random exponentially distributed reals
Calling Sequence: Rand(Exponential(a,b))
Returns:
numeric
Synopsis: This function returns a random exponentially distributed number
with average a+b and variance b^2. In mathematical terms, the probability
that the outcome is x is exp( -(x-a)/b ) / b. The first parameter, a, can
take any arbitrary value. The second parameter, b, has to be positive.
Exponential_Rand uses Rand() which can be seeded by either the function
SetRand or SetRandSeed.
References: Handbook of Mathematical functions, Abramowitz and Stegun,
26.1.28
Examples:
> Rand(Exponential(0.3,3));
2.4395
See Also:
?Beta_Rand ?FDist_Rand ?Normal_Rand ?StatTest
?Binomial_Rand ?GammaDist_Rand ?Poisson_Rand ?Std_Score
?ChiSquare_Rand ?Geometric_Rand ?SetRand ?Student_Rand
?CreateRandSeq ?Graph_Rand ?SetRandSeed ?Zscore
?Cumulative ?Multinomial_Rand ?Shuffle
ExtendClass
Function ExtendClass - Extend a class with additional fields
Calling Sequence: ExtendClass(newclass,oldclass,addarg1,...)
Parameters:
Name Type Description
------------------------------------------------------------------------------
newclass symbol The new class, oldclass with more fields
oldclass symbol The base class being extended
addarg1 [symbol, type, anything] Description of additional arguments
Returns:
NULL
Synopsis: ExtendClass creates a new class which has all the fields of the
base class plus additionally defined ones. The result is a new class which
automatically inherits all the methods of the oldclass and has additional
fields described in the 3 and onwards arguments of ExtendClass. The
description of each additional argument is a list of three values, the name
of the new field, its type and, optionally, its default value. The default
value is used when creating an object without it or when converting an
object from oldclass to newclass. More precisely the following functions
are created:
Old method New Method Comment
------------------------------------------------------------------------------
oldclass newclass Constructor based on the oldclass constructor
oldclass_xxx newclass_xxx same rules as Inherit
yyy_oldclass yyy_newclass conversions from other classes to newclass
oldclass_newclass widening conversion
newclass_oldclass narrowing conversion
If some methods are not expected to be inherited from oldclass, they should
either be unevaluated after calling ExtendClass or defined before calling
ExtendClass.
ExtendClass does an implicit Inherit, so there is no point in doing an Inherit
(newclass,oldclass). Any protection defined for the oldclass is inherited
in the newclass. The newclass can Inherit other additional classes as
usual.
Examples:
> ExtendClass( DistTree, Tree, [height,numeric,0] );
See also: ?CompleteClass ?Inherit ?objectorientation ?Protect
FDist_Rand
Function FDist_Rand - Generate random F-(variance-ratio) distributed reals
Calling Sequence: Rand(FDist(nu1,nu2))
Parameters:
Name Type
------------------
nu1 nonnegative
nu2 nonnegative
Returns:
nonnegative
Synopsis: This function returns a random F distributed or Variance-ratio
distributed number with average nu1/(nu2-2). If X1 and X2 are Chi-square
distributed variables with parameters nu1 and nu2, then X1/X2 is distributed
as FDist(nu1,nu2). This distribution has a non-finite expected value for
nu2<=2 and non-finite variance for nu2<=4. FDist_Rand uses Rand() which can
be seeded by either the function SetRand or SetRandSeed.
References: Handbook of Mathematical functions, Abramowitz and Stegun, 26.6
Examples:
> Rand(FDist(3,1));
1.9703
> Rand(FDist(1,100));
0.02513195
See Also:
?Beta_Rand ?Exponential_Rand ?Normal_Rand ?StatTest
?Binomial_Rand ?GammaDist_Rand ?Poisson_Rand ?Std_Score
?ChiSquare_Rand ?Geometric_Rand ?SetRand ?Student_Rand
?CreateRandSeq ?Graph_Rand ?SetRandSeed ?Zscore
?Cumulative ?Multinomial_Rand ?Shuffle
FileStat
Class FileStat - the unix file status structure
Template: FileStat(path)
Fields:
Name Type Description
----------------------------------------------------
path string a filename or a path
st_dev posint device
st_ino posint inode
st_mode posint protection
st_nlink posint number of hard links
st_uid integer user ID of owner
st_gid integer group ID of owner
st_rdev integer device type (if inode device)
st_size integer total size, in bytes
st_blksize posint blocksize for filesystem I/O
st_blocks integer number of blocks allocated
st_atime posint time of last access
st_mtime posint time of last modification
st_ctime posint time of last change
Returns:
FileStat
Methods: FileStat_type
Synopsis: This class stores the unix stat structure, see "man 2 stat" in any
unix system for details. When called with a single argument, it constructs
the entire structure. The unix names have been retained for the fields.
This operation is very efficient, it only requires reading the directory and
completes without the execution of any system command. Hence it is the
recommended way of finding any information about a file. When the file does
not exist, an empty data structure is returned.
Examples:
> FileStat(libname)[st_size];
163840
> FileStat('/dev/null')[st_mtime];
1349038461
> FileStat(non_existing_file);
FileStat()
See Also:
?inputoutput ?OpenReading ?ReadLine ?SearchDelim
?LockFile ?OpenWriting ?ReadRawFile ?SplitLines
?OpenAppending ?ReadData ?ReadRawLine
FindCircularOrder
Function FindCircularOrder - list of Leaf labels in lexicographical order
Calling Sequence: FindCircularOrder(t)
Parameters:
Name Type Description
-------------------------
t Tree input tree
Returns:
list
Synopsis: Find a circular order of a tree, in particular, a lexicographical
order of a Tree.
Examples:
> tree := Tree(Tree(Tree(Leaf(f9,-90.4683,372),-89.6 ..(219).. 572,367))):
> FindCircularOrder(tree);
[f9, e8, e7, e6, e5, e4, f9]
See also: ?CircularTour ?Clusters ?Leaf ?Leaves ?Tree
FindConnectedComponents
Function FindConnectedComponents - set of connected components of a Graph
Calling Sequence: FindConnectedComponents(G)
Parameters:
Name Type Description
----------------------------
G Graph a given Graph
Returns:
set(Graph)
Synopsis: This function computes the set of connected components of a Graph,
which are returned as a set of Graphs. The Graph is assumed to be
undirected. The disconnected nodes are returned as singleton Graphs, i.e. a
Graph with a single node.
Examples:
> G1 := Graph(Edges(Edge('a',1,2),Edge('b',2,3)), Nodes(1,2,3,4)):
> FindConnectedComponents(G1);
{Graph(Edges(),Nodes(4)),Graph(Edges(Edge(a,1,2),Edge(b,2,3)),Nodes(1,2,3))}
See Also:
?BipartiteGraph ?Graph ?MaxEdgeWeightClique ?RegularGraph
?Clique ?Graph_minus ?MinCut ?ShortestPath
?DrawGraph ?Graph_Rand ?MST ?TetrahedronGraph
?Edge ?Graph_XGMML ?Nodes ?VertexCover
?EdgeComplement ?InduceGraph ?ParseDimacsGraph
?Edges ?MaxCut ?Path
FindEntropy
Function FindEntropy
Calling Sequence: FindEntropy(day)
Parameters:
Name Type
----------------
day DayMatrix
Returns:
numeric
Synopsis: Computes the relative entropy H of day, i.e. how many bits of
information are available per position of an alignment. See S.F. Altschul,
"Amino Acid Substitution Matrices from an Information Theoretic Perspective"
, JMB 219(1991):555-565.
Examples:
> CreateDayMatrices();
> FindEntropy(DMS[1]);
4.1819
> FindEntropy(DMS[500]);
0.2829
> FindEntropy(DMS[1000]);
0.00618796
FindHighlyExpressedGenes
Function FindHighlyExpressedGenes - Find genes with high expression
Calling Sequence: FindHighlyExpressedGenes([e])
Returns:
list
Synopsis: experimental expression data must be avalable in the entries
See also: ?ComputeCAI ?SetupRA
FindLongestRep
Function FindLongestRep
Option: builtin
Calling Sequence: FindLongestRep(db)
FindLongestRep(db,len)
FindLongestRep(db,len,eb)
Parameters:
Name Type
---------------
db database
len integer
eb integer
Returns:
string
Synopsis: Find the longest repetition(s) in the database db. If len is
specified, then return only those repetitions longer than len. If len and
eb are specified, then return repetitions longer than len - endbonus when
matching to the end of both sequences. This command requires that a pat
index has been built for the database db
FindNucPepPam
Function FindNucPepPam - Compute Pam estimate for a NucPepMatch
Option: builtin
Calling Sequence: FindNucPepPam(npm,DMS)
Parameters:
Name Type
-----------------------
npm NucPepMatch
DMS array(DayMatrix)
Returns:
NULL
Synopsis: Computes the best pam estimate and its variance for the given
NucPepMatch.
Examples:
See Also:
?AlignNucPepAll ?GlobalNucPepAlign ?NucPepBackDynProg
?AlignNucPepMatch ?LocalNucPepAlign ?NucPepDynProg
?DynProgNucPepString ?LocalNucPepAlignBestPam ?NucPepMatch
FindRules
Function FindRules( t:Tree )
Checks the tree for any rules in the form:
a is closer to b than to c and returns a list of those rules.
FindSpeciesViolations
Function FindSpeciesViolations( arg:anything )
arg: a Tree or a list.
If it is a tree, it must contain information (6, 7) about species.
Use AddSpecies to get such a tree.
From this tree a list of rules is generated (a closer to b than to c etc).
If it is a list of those rules ([a, {b, c}], [d, {e, f}], ...) a list of
contradictions is returned
GOdefinition
Function GOdefinition - returns the definition of a Gene Ontology
Calling Sequence: GOdefinition(go)
Parameters:
Name Type Description
------------------------------------
go {posint,string} GO number
Returns:
string
Synopsis: Returns a longer definition describing a GO number. The argument
can either be a number or a string of the form 'GO:002354'.
Examples:
> GOdefinition(23);
The chemical reactions and pathways involving the disaccharide maltose (4-O-alpha-D-glucopyranosyl-D-glucopyranose), an intermediate in the catabolism of glycogen and starch
See Also:
?GOdownload ?GOnumber ?GOsubclassR ?GOsuperclassR
?GOname ?GOsubclass ?GOsuperclass
GOdownload
Function GOdownload - downloads the gene ontologies and converts them to a
Darwin readable format
Calling Sequence: GOdownload
Returns:
NULL
Synopsis: Downloads the gene ontologies from http://www.geneontology.org/
ontology/gene_ontology.obo and converts them to Darwin tables that are
stored in the file GOdata.drw which is located in Darwin' data directory.
See Also:
?GOdefinition ?GOnumber ?GOsubclassR ?GOsuperclassR
?GOname ?GOsubclass ?GOsuperclass
GOname
Function GOname - returns the name of a Gene Ontology
Calling Sequence: GOname(go)
Parameters:
Name Type Description
------------------------------------
go {posint,string} GO number
Returns:
string
Synopsis: Returns the name for a GO number. The argument can either be a
number or a string of the form 'GO:001369'.
Examples:
> GOname(23);
maltose metabolic process
See Also:
?GOdefinition ?GOnumber ?GOsubclassR ?GOsuperclassR
?GOdownload ?GOsubclass ?GOsuperclass
GOnumber
Function GOname - returns the GO number of a Gene Ontology term
Calling Sequence: GOnumber(go)
Parameters:
Name Type Description
---------------------------
desc string GO name
Returns:
integer
Synopsis: Returns the GO number corresponding to a GO name. This function is
the inverse of GOname().
Examples:
> GOnumber('metabolic process');
8152
See Also:
?GOdefinition ?GOnumber ?GOsubclassR ?GOsuperclassR
?GOdownload ?GOsubclass ?GOsuperclass
GOsubclass
Function GOsubclass - returns all subclasses for a Gene Ontology
Calling Sequence: GOsubclass(go)
GOsubclass(go,links = {can_be,has_parts})
Parameters:
Name Type Description
----------------------------------------------------------------------------
go {posint,string} GO number
links set(string) types of links to follow (can_be and/or has_parts)
Returns:
list(integer)
Synopsis: Returns all subclasses for a Gene Ontology. This is the inverse of
the 'is_a' and 'part_of' relationship. The argument can either be a number
or a string of the form 'GO:009594'. The optional named argument 'links' can
be used to restrict the type of relationships using 'can_be' or 'has_parts'.
Examples:
> GOname(48311);
mitochondrion distribution
> GOsubclass(48311);
[1, 48312]
> for t in GOsubclass(48311) do print(GOname(t)) od;
mitochondrion inheritance
intracellular distribution of mitochondria
See Also:
?GOdefinition ?GOname ?GOsubclassR ?GOsuperclassR
?GOdownload ?GOnumber ?GOsuperclass
GOsubclassR
Function GOsubclassR - recursive calls to GOsubclass
Calling Sequence: GOsubclassR(go)
GOsubclassR(go,links = {can_be})
Parameters:
Name Type Description
----------------------------------------------------------------------------
go {posint,string} GO number
links set(string) types of links to follow (can_be and/or has_parts)
Returns:
list(integer)
Synopsis: Recursively calls GOsubclass to find all subclasses for a Gene
Ontology. The argument can either be a number or a string of the form
'GO:001819'.
Examples:
> GOname(7005);
mitochondrion organization
> GOsubclassR(7005);
[1, 2, 266, 1836, 1844, 6264, 6390, 6391, 6392, 6393, 6626, 6627, 7006, 7007, 7008, 7287, 8053, 8637, 30150, 30382, 32042, 32043, 32543, 32976, 32979, 32981, 33108, 33615, 33617, 33955, 34551, 34553, 42407, 42792, 43504, 43653, 45039, 45040, 45041, 45042, 45043, 45044, 46902, 48311, 48312, 51204, 70096, 70124, 70125, 70126, 70127, 70143, 70144, 70145, 70146, 70147, 70148, 70149, 70150, 70151, 70152, 70153, 70154, 70155, 70156, 70157, 70158, 70159, 70183, 70184, 70185, 70584]
See Also:
?GOdefinition ?GOname ?GOsubclass ?GOsuperclassR
?GOdownload ?GOnumber ?GOsuperclass
GOsuperclass
Function GOsuperclass - returns all superclasses for a Gene Ontology
Calling Sequence: GOsuperclass(go)
GOsuperclass(go,links = {is_a})
Parameters:
Name Type Description
------------------------------------------------------------------------
go {posint,string} GO number
links set(string) types of links to follow (is_a and/or part_of)
Returns:
list(integer)
Synopsis: Returns all superclasses for a Gene Ontology. This represents in
the default case both the 'is_a' and 'part_of' relationship. The argument
can either be a number or a string of the form 'GO:005951'. The optional
named argument 'links' can be used to restrict the type of relationships to
one of them.
Examples:
> GOname(1);
mitochondrion inheritance
> GOsuperclass(1);
[48308, 48311]
> for t in GOsuperclass(1) do print(GOname(t)) od;
organelle inheritance
mitochondrion distribution
See Also:
?GOdefinition ?GOname ?GOsubclass ?GOsuperclassR
?GOdownload ?GOnumber ?GOsubclassR
GOsuperclassR
Function GOsuperclassR - recursive calls to GOsuperclass
Calling Sequence: GOsuperclassR(go)
GOsuperclassR(go,links = {is_a})
Parameters:
Name Type Description
-----------------------------------------------------------------------
go {posint,string} GO number
links set(string) types of links to follow (is_a and/or part_of
Returns:
list(integer)
Synopsis: Recursively calls GOsuperclass to find all superclasses for a Gene
Ontology. The argument can either be a number or a string of the form
'GO:008085'.
Examples:
> GOname(1);
mitochondrion inheritance
> GOsuperclassR(1);
[6996, 7005, 8150, 9987, 16043, 48308, 48311, 51179, 51640, 51641, 51646]
> for t in GOsuperclassR(1) do print(GOname(t)) od;
organelle organization
mitochondrion organization
biological_process
cellular process
cellular component organization
organelle inheritance
mitochondrion distribution
localization
organelle localization
cellular localization
mitochondrion localization
See Also:
?GOdefinition ?GOname ?GOsubclass ?GOsuperclass
?GOdownload ?GOnumber ?GOsubclassR
Gamma
Function Gamma - the Gamma and Incomplete Gamma functions
Calling Sequence: Gamma(a)
Gamma(a,x)
Parameters:
Name Type Description
-------------------------------------------------------------------------
a numeric a numerical value
x numeric a nonnegative argument for the Incomplete Gamma function
Returns:
numeric
Synopsis: For a positive integer a, Gamma(a) returns the product of 1*2*3*...
*(a-1) = (a-1)!. Gamma satisfies the functional equation:
Gamma(a+1) = a*Gamma(a)
Gamma can be defined as a definite integral:
infinity
/
| (a - 1)
Gamma(a) = | t exp(-t) dt
|
/
0
For non-integer values it is also possible to define Gamma for negative
arguments. When Gamma is used with two arguments, it is understood to be
the Incomplete Gamma function, defined by the integral:
infinity
/
| (a - 1)
Gamma(a, x) = | t exp(-t) dt
|
/
x
In this case, a must be positive.
References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun,
Ch 6.1, 6.5.3
Examples:
> Gamma(7);
720
> Gamma(100);
9.3326215443944096e+155
> Gamma(-1.5);
2.3633
> Gamma(3,2);
1.3534
See also: ?factorial ?LnGamma ?Lngamma
GammaDist_Rand
Function GammaDist_Rand - Generate random Gamma distributed reals
Calling Sequence: Rand(GammaDist(p))
Parameters:
Name Type
------------------
p nonnegative
Returns:
nonnegative
Synopsis: This function returns a random Gamma distributed number with
average p and variance p. The sum of two Gamma distributed random variables
with parameters p and q is a Gamma distributed variable with parameter p+q.
We have to call this function GammaDist to prevent the collision with the
Gamma function. GammaDist_Rand uses Rand() which can be seeded by either
the function SetRand or SetRandSeed.
References: Handbook of Mathematical functions, Abramowitz and Stegun,
26.1.32
Examples:
> Rand(GammaDist(3));
4.2584
> Rand(GammaDist(100));
104.3901
See Also:
?Beta_Rand ?Exponential_Rand ?Normal_Rand ?StatTest
?Binomial_Rand ?FDist_Rand ?Poisson_Rand ?Std_Score
?ChiSquare_Rand ?Geometric_Rand ?SetRand ?Student_Rand
?CreateRandSeq ?Graph_Rand ?SetRandSeed ?Zscore
?Cumulative ?Multinomial_Rand ?Shuffle
Gap
Data structure Gap( Pos:posint, Len:posint, Seq:integer, Flag:integer )
Function: creates a gap data structure
The Gap starts at position Pos and is of length Len.
Selectors:
Pos, - the position where the gap starts
Len, - the length of the gap
Seq, - the sequence number where the gap was found
Flag - 1 if the gap appears identically in another sequence
GapHeuristic
Data structure GapHeuristic( )
Function: creates a gap heuristic data structure
Selectors:
Type: can be ALL, FUSION, ISLAND, STACKING, GAPSHIFT, STACKSHIFT, ISLANDSHIFT, RANDOM
default value: ALL (type 1)
GAP PARAMETERS:
Mingaplen: Minimum length of gaps to process.
default value: 1
Maxgaplen: Maximum length of gaps to process. Values < 0 mean
unlimited. default value: -1
Maxgaps: maximum number of gaps that can be fused
default value: 2 (should be from 1 - 5)
Gapdelta: maximum allowed difference in gap sum. This should be
small. If it is greater than zero, then the gap sum in
a block can vary by this value. default value: 1
Stackdelta: maximum number of amino acids to left and right of a
given block where other blocks should be seeked.
This is needed for the stacking of gap-blocks.
default value: 20
ISLAND PARAMETERS:
Minislandlen: maximum length of an island to group and shift around
(aa between two gaps)
default value: 1
Maxislandlen: minimum length of an island to group and shift around
default value: 10
Islanddelta: maximum variation of the island length in number of
amino acids. Should be zero. default value: 0
LEFT RIGHT SHIFTING AND RANDOM SHIFTING PARAMETERS:
Window: This value is needed for the shifting of gap blocks to
the left and right, AND for the random shifting
(meandistance for shifting). Each block in an alignment
is shifted by this to left and right, and all positions
in between are checked for a better score
default value: 5
Times: How many times each gapblock is randomly shifted
default value: 5
OTHER PARAMETERS:
Maxaa: maximum sum of number of amino acids between gaps
default value: 10
Extension: maximum number of amino acids to the left/right of a
gaprow where the program should look for other gaps
default value: 10
Flag: NORMAL: the values stay the same in each round
INCREMENTAL: in each round the values are increased
RANDOM: random values are used each round - the maximum
values are the ones initially set
default value: NORMAL (1)
Counter: Maximum number of times the heuristics should be repeated.
Values < 0 mean unlimited - i.e
until the score no longer increases
default value: 10
MaxBlocks: Maximum number of blocks to process.
f the number of blocks is very lage ( > 100) then it can
take very long to compute the alignment.
In this case the parameters should be decreased.
Values < 0 mean unlimited.
default value: 100
GapMatch
Data structure GapMatch( )
Function: creates a datastructure to keep a GapMatch
Selectors:
align1: alignment string of first sequence
align2: " " 2nd "
seq1: sequence 1
seq2: sequence 2
Pam: Pam distance
len: length of alignmen
score: similarity score
mid: middle string (match string with |, ! and : etc)
iden: identity
Constructors:
GapMatch(seq1, seq2);
GapTree
Function GapTree - build a phylogenetic tree based on gaps
Calling Sequence: GapTree(msa,...)
Parameters:
Name Type Description
-----------------------------------------------------------------
msa MAlignment one or many MAlignments over the same species
Returns:
Tree
Global Variables: GapTree_Title
Synopsis: GapTree builds a phylogenetic tree based on the gaps of one/several
multiple sequence alignments. The assumption is that gap creation is a
sufficiently rare which allows us to build better trees for longer
distances. The gaps are extracted from MAlignments given as arguments.
Only single gaps which are clearly delimited are used. Areas in which no
sequence is gap-less are not considered. Areas where sequences have two
gaps are also discarded. The existence/non-existence of gaps is then fed to
a parsimony algorithm to produce a tree. The input MAlignments should be on
the same set of labels. More specifically, we expect the MAlignments to be
over different sets of sequences belonging to the same set of species,
identified by the same list of labels.
The global variable GapTree_Title is set to a short description of the details
of the construction.
See Also:
?BootstrapTree ?Entry ?LeastSquaresTree ?Sequence ?Synteny
?DrawTree ?Leaf ?MAlignment ?SignedSynteny ?Tree
GaussElim
Function GaussElim
Option: builtin
Calling Sequence: GaussElim(A,b)
Parameters:
Name Type
----------------------
A matrix(numeric)
b array(numeric)
Returns:
a vector (one dimensional array) of numeric
Synopsis: Given a matrix of numerical values A and vector b, this function
computes x so that A * x = b by Gaussian elimination. A must be a square
numerical matrix.
Examples:
> GaussElim([[2,4,6], [9,0,27],[17,23,5]], [8, 15, 17]);
[-0.8225, 1.1667, 0.8297]
See Also:
?Cholesky ?GivensElim ?matrix
?convolve ?Identity ?matrix_inverse
?Eigenvalues ?LinearProgramming ?transpose
Gene
Class Gene
Template: Gene(Division,NucEntry,Exons,PepOffset,AlignErrors)
Fields:
Name Type Description
-----------------------------------------------------------
Division string
NucEntry integer
Exons list(posint..posint)
PepOffset PepLength
AlignErrors integer
Division
NucEntry
Exons list of exon locations
Introns
mRNA
NucSequence
PepOffset
PepLength
PepSequence
AlignErrors
Returns:
Gene
Methods: Gene_type NucPepMatch print select
Synopsis: Data structure defining gene-peptide references.
Examples:
See also: ?NSubGene ?NucPepMatch ?PSubGene
GenomeSummary
Class GenomeSummary - summary information of a database file
Template: GenomeSummary(DB)
Fields:
Name Type Description
----------------------------------------------------------------------------
DB database database structure to create a summary
FileName string name of external file containing the database
string string the entire header of the database as a string
TotAA posint number of amino acids or bases in the database
TotChars posint number of characters in the database
TotEntries posint number of entries in the database
type string dna, rna, mixed or peptide
EntryLengths list(posint) length of each entry
Id string 5-letter code (SwissProt) for species/genome
Kingdom string either Bacteria, Archaea or Eukaryota
Lineage list(string) Lineage as a list (from OS tags)
Genus string First part of the scientific name
Epithet string Second part of the scientific name
sgml_tag string The contents of the tag in the database header
Returns:
GenomeSummary
Methods: GenomeSummary_type print Rand select string
Synopsis: GenomeSummary provides an alternative to loading a database when
the sequences themselves are not needed. Typically, the database is loaded,
then GenomeSummary is run and its results are stored in a file for later
reading. In this way, all of the data except for the sequences themselves,
is available and many genomes can be loaded into a darwin session.
GenomeSummary has all the selectors which are available for a database (except
for Entry and Pat which are can only be used if the sequences are present).
Additionally it provides a few additional selectors. The EntryLengths
contains the length of the sequence of each entry. The string selector,
does not select the entire text of the database, just the text that is
before the first entry. This is normally called the header of the database.
In the header there are several useful tags which describe the entire
database, for example, 5-letter code, kingdom, lineage, etc. This
information is available directly through selectors. Any other tagged
information in the header can be selected with the name of the tag as a
selector.
Examples:
> ReadDb('/home/darwin/DB/genomes/ECOLI/ECOLI.db'):
> gs := GenomeSummary(DB):
> gs[TotAA];
1358990
> gs[Lineage];
[Bacteria, Proteobacteria, Gammaproteobacteria, Enterobacteriales,
Enterobacteriaceae, Escherichia, Escherichia coli]
> print(gs);
FileName: /home/darwin/DB/genomes/ECOLI/ECOLI.db
string: Escherichia coli K-12 MG1655 complete genome.
Geometric_Rand
Function Geometric_Rand - Generate random geometrically distributed integers
Calling Sequence: Rand(Geometric(p))
Returns:
integer
Synopsis: This function returns a random geometrically distributed integer
with average (1-p)/p and variance (1-p)/p^2. In mathematical terms, the
probability that the outcome is i is p*(1-p)^i (for 0 <= i). Notice that
the distribution starts at 0. Geometric_Rand uses Rand() which can be
seeded by either the function SetRand or SetRandSeed.
References: Handbook of Mathematical functions, Abramowitz and Stegun,
26.1.24
Examples:
> Rand(Geometric(0.3));
4
> Rand(Geometric(0.01));
51
See Also:
?Beta_Rand ?Exponential_Rand ?Normal_Rand ?StatTest
?Binomial_Rand ?FDist_Rand ?Poisson_Rand ?Std_Score
?ChiSquare_Rand ?GammaDist_Rand ?SetRand ?Student_Rand
?CreateRandSeq ?Graph_Rand ?SetRandSeed ?Zscore
?Cumulative ?Multinomial_Rand ?Shuffle
GetAaCount
Function GetAaCount
Calling Sequence: GetAaCount(db)
Parameters:
Name Type
---------------
db database
Returns:
list(numeric,20)
Synopsis: This function counts the number of occurrences of each of the
twenty amino acids. It returns a list in the standard amino acid order.
This function requires that a patricia tree has been created for the
database assigned to DB.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
> amino_acid_counts := GetAaCount(DB);
amino_acid_counts := [4667613, 3174685, 2506004, 3164021, 932902, 2350027, 3935943, 4140146, 1358718, 3522464, 5740368, 3536945, 1416890, 2392860, 2893327, 4101839, 3256308, 692928, 1836010, 4004568]
See also: ?GetAaFrequency
GetAaFrequency
Function GetAaFrequency
Calling Sequence: GetAaFrequency(db)
Parameters:
Name Type
---------------
db database
Returns:
NULL
Synopsis: This procedure computes the percent amino acid frequencies of the
database assigned to db. It prints out the results in a nice format. This
function requires that a patricia tree has been created for the database
assigned to db.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
> GetAaFrequency(DB);
Alanine 7.83 %
Arginine 5.32 %
Asparagine 4.20 %
Aspartic acid 5.31 %
Cysteine 1.56 %
Glutamine 3.94 %
Glutamic acid 6.60 %
Glycine 6.94 %
Histidine 2.28 %
Isoleucine 5.91 %
Leucine 9.63 %
Lysine 5.93 %
Methionine 2.38 %
Phenylalanine 4.01 %
Proline 4.85 %
Serine 6.88 %
Threonine 5.46 %
Tryptophan 1.16 %
Tyrosine 3.08 %
Valine 6.72 %
unknown 0.01 %
GetAllNucPepMatches
Function GetAllNucPepMatches
Option: builtin
Calling Sequence: GetAllNucPepMatches(npm,D,goal)
Parameters:
Name Type
------------------
npm NucPepMatch
D DayMatrix
goal numeric
Returns:
list
Synopsis: Return the list of all NucPepMatch between the nucleotide and the
peptide sequences in npm reaching goal.
Examples:
See also: ?GetAllMatches ?NucPepMatch
GetComplement
Function GetComplement
Calling Sequence: GetComplement(nuc)
Parameters:
Name Type Description
-----------------------------------------
nuc string a string of DNA/RNA bases
Returns:
string
Global Variables: CO_Cache
Synopsis: Computes the complementary DNA/RNA strand for the given sequence.
Examples:
> GetComplement('ACTTACG');
CGTAAGT
See Also:
?AltGenCode ?BaseToInt ?CIntToAmino ?CodonToInt ?IntToBBB
?AminoToInt ?BBBToInt ?CIntToCodon ?Complement ?IntToCInt
?antiparallel ?BToInt ?CIntToInt ?GeneticCode ?IntToCodon
?AToCInt ?CIntToA ?CodonToA ?IntToB ?Reverse
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase
GetEntryInfo
Function GetEntryInfo - selected tag information from a database entry
Calling Sequence: GetEntryInfo(EntryDescr,tag1,tag2)
Parameters:
Name Type Description
-----------------------------------------------------------------------------
EntryDescr {integer,list,string} an entry, entry offset or a list of same
tag1 string
tag2 optional tags
Returns:
expseq(string)
Synopsis: Return the information tags (tag1 and additional optional tags) for
an entry given by offset or several entries given by an Entry data
structure. The function returns an expression sequence of string, two
elements for each entry and tag - the first being the tag and the second
being the information for that tag.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
> GetEntryInfo(100,'DE');
DE, 104 kDa microneme-rhoptry antigen.
> GetEntryInfo([Entry(1,2)],'AC','ID' );
AC, P15711;, ID, 104K_THEPA, AC, Q43495;, ID, 108_LYCES
See Also:
?Entry ?SearchAC ?SearchTag
?GetEntryNumber ?SearchID ?Species_Entry
GetEntryNumber
Function GetEntryNumber
Option: builtin
Calling Sequence: GetEntryNumber(offset,df)
Parameters:
Name Type Description
---------------------------------------------------------------------
offset {integer,string} an offset of an entry or an entry
df database optional - will default to DB if assigned
Returns:
integer
Synopsis: Return the number of the entry which contains the given offset in
df (default DB). If the argument is a string, it is assumed to be part of
the database - in which case the entry number including the beginning of the
string is returned.
Examples:
> a := Entry(1);
a := 104K_THEPAP15711;104 kDa ..(1255).. L
> GetEntryNumber(a);
1
> GetEntryNumber(34675449);
30873
See also: ?Entry ?GetEntryInfo ?GetOffset ?TextHead
GetFileInfo
Function GetFileInfo( CommentString:string )
Determines some information about where when and by
whom a file has been created.
GetGramRegionScore
Function GetGramRegionScore
Option: builtin
Calling Sequence: GetGramRegionScore(n,S)
Parameters:
Name Type
-----------------
n string
G GramRegion
Returns:
numeric
Synopsis: Computes k-gram region scores over nucleotide sequence n according
to S.
See also: ?GetGramRegion ?GetMolWeight ?GetMostFrequentGrams
GetGramSiteScore
Function GetGramSiteScore
Option: builtin
Calling Sequence: GetGramSiteScore(n,S)
Parameters:
Name Type
---------------
n string
S GramSite
Returns:
NULL
Synopsis: Computes k-gram site scores over nucleotide sequence n according to
S.
Examples:
See also: ?GetGramRegionScore ?GetGramSite ?GramSite
GetIntrons
Function GetIntrons
Calling Sequence: GetIntrons(m)
Parameters:
Name Type
------------------
m NucPepMatch
Returns:
list
Synopsis: Returns the introns derived from m. m[Introns] must be defined.
Examples:
See also: ?NucPepMatch
GetLcaSubtree
Function GetLcaSubtree( t )
Get all leaf numbers of tree t
GetMATreeNew
Function GetMATreeNew( MA:array(string) )
Estimates Dist and Var Matrices from an alignment
GetMachineUsage
Function GetMachineUsage
Calling Sequence: GetMachineUsage(logfile)
Parameters:
Name Type
----------------
logfile string
Returns:
NULL
Synopsis: Reads a log file created by ParExecuteIPC and produces a listing
containing all machines and the work they did, sorted by machine usage.
See Also:
?ConnectTcp ?ipcsend ?ParExecuteTest ?SendDataTcp
?darwinipc ?ParExecuteIPC ?ReceiveDataTcp ?SendTcp
?DisconnectTcp ?ParExecuteSlave ?ReceiveTcp
GetMolWeight
Function GetMolWeight
Calling Sequence: GetMolWeight(s)
Parameters:
Name Type Description
------------------------------------------------------------------
s {string,list(string)} an (or list of) amino acid sequence
Returns:
{numeric,list(numeric)}
Synopsis: This function computes the molecular weight of an amino acid
sequence or list of amino acid sequences.
Examples:
> GetMolWeight('IHGGCA');
556.6290
> GetMolWeight(['VTTWD', 'LIHAAG']);
[620.6250, 580.6720]
See also: ?GetMostFrequentGrams
GetMostFrequentGrams
Function GetMostFrequentGrams
Option: builtin
Calling Sequence: GetMostFrequentGrams(n,k)
Parameters:
Name Type
-------------
n posint
k posint
Returns:
NULL
Synopsis: This function prints out the n most frequent k-grams (sequences of
length k). It requires a database loaded at system variable DB.
Examples:
> GetMostFrequentGrams(5, 5);
The 5 most frequent strings of length 5 or longer are:
"GGGGG" occurs 4359 times (1997 without overlaps)
"EEEEE" occurs 4718 times (2075 without overlaps)
"SSSSS" occurs 4980 times (2320 without overlaps)
"AAAAA" occurs 5924 times (2793 without overlaps)
"QQQQQ" occurs 7032 times (2404 without overlaps)
See Also:
?DB ?GetGramRegion ?GetGramRegionScore ?GetMolWeight ?GramRegion
GetOffset
Function GetOffset - Gets an offset in the database for a string
Option: builtin
Calling Sequence: GetOffset(seq)
Parameters:
Name Type Description
---------------------------------------------------
seq string a string in or outside the database
Returns:
integer
Synopsis: The GetOffset function finds the offset of a string whether it is
in the database or outside. It is necessary when we want to pretend that a
string is a sequence in the database to make it an argument of Match. The
GetOffset requires that the system variable DB must be assigned a sequence
database.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
> CreateDayMatrices();
> s1 := 'MSRYEKMFARLNERNQGAFVPFVTVCDPNAEQSYKIMETLVESGADALELGIPFSDP':
> s2 := 'MLLLSVNPPLFIPFIVAGDPSPEVTVDLALALEEAGADLLELGVPYSDP':
> m3 := Match( GetOffset(s1), GetOffset(s2) );
m3 := Match(801680240,487437408)
See also: ?MAlign ?NucPepMatch ?ReadDb ?TotalAlign
GetPartitions
Function GetPartitions( )
returns the splits or partitions of a data set or a tree.
The resulting data structure is a list of sets
GetPathDistance
Function GetPathDistance( order:array )
order: order of tree or AllAll traversal
If second argument is a tree, then the tree
is traversed in the given order and the length
(only in PAM units!) of the path is returned.
If second argument is an array of Matches (AllAll) then
the AllAl is "traversed" in the given order and
the path length is returned. The score is always
divided by the length of the match
The second argument may also be a distance matrix.
If a third argument is given ("PAM" or "SCORE") the units
of the distance can be chosen (for the AllAll)
GetPeptides
Function GetPeptides
Calling Sequence: GetPeptides(m)
Parameters:
Name Type
------------------
m NucPepMatch
Returns:
NULL
Synopsis: Returns the peptide derived from m. m[NucGaps] and m[Introns] must
be defined. Note that amino acids derived from indels which are a multiple
of 3 do not always correspond to the reading frame implied by the alignment.
Examples:
See also: ?NucPepMatch
GetPosition
Function GetPosition
Calling Sequence: GetPosition(df,ofs)
Parameters:
Name Type
---------------
df database
ofs integer
Returns:
list
Global Variables: DB
Synopsis: Returns [pos, len] such that t is the complete sequence ofs is
pointing to after execution of 't := ofs + df[string]; t := t[1..len];'.
Examples:
See also:
GetSubTree_r
Function GetSubTree_r( t:Tree, i, j )
Get the subtree that has both leaves i and j, one in the left and one
in the right subtree
GetTreeLabels
Function GetTreeLabels
Calling Sequence: GetTreeLabels(t)
Parameters:
Name Type
-----------
t Tree
Returns:
list
Synopsis: This function returns a list of all the leaf labels present in t.
Examples:
> T := Tree( Leaf(a, 2), 0.5, Leaf(e, 1) );
T := Tree(Leaf(a,2),0.5000,Leaf(e,1))
> GetTreeLabels( T );
a, e
See also: ?Leaf ?Tree
GivensElim
Function GivensElim
Calling Sequence: GivensElim(A)
Parameters:
Name Type
----------------------------------
A an m x n numerical matrix
Returns:
[matrix, matrix] : [Q,R]
Synopsis: GivensElim factors an m x n matrix A into two factors, A = Q*R.
This decomposition is done with individual Givens' rotations. The
decomposition is commonly called the QR-decomposition. Q is an m x m square
orthonormal matrix, that is Q*Q^t = I. R is an m x n upper triangular
matrix. If the matrix is found to be singular, then R will have zeros in
the diagonal and the decomposition is still correctly done.
References: Computermathematik, Walter Gander, Birkhauser, Ch 5.3
Examples:
> GivensElim( [[1,2], [-2,3]] );
[[[-0.4472, -0.8944], [0.8944, -0.4472]], [[-2.2361, 1.7889], [0, -3.1305]]]
See Also:
?Cholesky ?GaussElim ?matrix
?convolve ?Identity ?matrix_inverse
?Eigenvalues ?LinearProgramming ?transpose
GlobalNucPepAlign
Function GlobalNucPepAlign
Calling Sequence: GlobalNucPepAlign(m,DM)
Parameters:
Name Type
------------------
m NucPepMatch
DM DayMatrix
Returns:
NucPepMatch
Global Variables: DB
Synopsis: Run the dynamic programming algorithm for the given Match with the
given DM matrix.
Examples:
See Also:
?AlignNucPepAll ?GetPeptides ?VisualizeGene
?FindNucPepPam ?LocalNucPepAlign ?VisualizeProtein
?Gene ?LocalNucPepAlignBestPam
?GetIntrons ?NucPepMatch
Globals
Function Globals - returns all global variables set inside a function
Calling Sequence: Globals(func)
Parameters:
Name Type Description
-------------------------------
func procedure the function
Returns:
set(symbol)
Synopsis: Globals returns all global variables that are set inside a
function. For functions inside modules and inside other functions Globals
returns exactly those global variables that are also visible to the user.
Variables that are only global inside a module are not reported.
Examples:
> Globals(CreateDayMatrices);
{AF,DM,DMS,logPAM1}
See also: ?local ?UnassignGlobals
GramRegion
Class GramRegion
Template: GramRegion(ProbI,ProbE,Extend,LogR0)
GramRegion(intCounts,totCounts,Extend,LogR0)
Fields:
Name Type
-----------------------
ProbI array(numeric)
ProbE array(numeric)
Extend integer
LogR0 numeric
Mean numeric
Min numeric
Max numeric
Returns:
structure(array,array,integer,numeric)
Methods: GramRegion_type print select
Synopsis: Structure to hold GramRegion scoring model data. If called with
array(integer) as the first two arguments (holding Counting data), it
returns the GramRegion data structure with ProbI and ProbE.
Examples:
See also: ?GetGramRegion ?GetMolWeight ?GetMostFrequentGrams
GramSchmidt
Function GramSchmidt
Calling Sequence: GramSchmidt(A)
Parameters:
Name Type
--------------------------------------------------------
A a list of linearly independent vectors (a matrix)
Returns:
matrix(numeric)
Synopsis: The GramSchmidt function computes an orthonormal basis spanning the
same subspace as the vectors in A. The input matrix A is interpreted as a
list of vectors. The vectors have to be all of the same dimension and
linearly independent. The result is a list of orthonormal vectors, with the
same dimension as A. If the dimension of A is m x n, then m <= n. This is
often called the Gram-Schmidt orthonormalization process.
Examples:
> GramSchmidt( [[1,2],[1,-1]] );
[[0.4472, 0.8944], [0.8944, -0.4472]]
> GramSchmidt( [[0,1,-1],[1,-1,3]] );
[[0, 0.7071, -0.7071], [0.5774, 0.5774, 0.5774]]
See Also:
?Cholesky ?GaussElim ?LinearProgramming ?transpose
?convolve ?GivensElim ?matrix
?Eigenvalues ?Identity ?matrix_inverse
GramSite
Class GramSite
Template: GramSite(Scores,LeftLen,LogR0)
GramSite(counts,totCounts,LeftLen,LogR0)
Fields:
Name Type
--------------------------------
Scores array(array(numeric))
LeftLen posint
LogR0 numeric
RightLen posint
Mean numeric
Min numeric
Max numeric
Returns:
structure(array,posint,numeric)
Methods: GramSite_type print select
Synopsis: Structure to hold k-gram scoring model data. When called with
counts and totCounts, it returns the Gram Site Scores
Examples:
See also: ?GetGramSite ?GetGramSiteScore
Graph
Class Graph - Data structure for storing a graph
Template: Graph(Edges,Nodes)
Fields:
Name Type Description
-----------------------------------------------------------------------------
Edges Edges description of edges
Nodes Nodes description of nodes (vertices)
Degrees list(integer) a list containing the degree of each node
Adjacencies list(list) an array with lists of adjacent nodes,
indexed by node number
Incidences list(list) an array with lists of incident edge
numbers, indexed by node number.
Distances matrix(numeric) a square matrix containing the distance
between pairs of nodes (assuming that the
edges label is a distance or a list with
first element being a distance).
Disconnected nodes are at dist DBL_MAX.
Labels matrix a square matrix containing the labels of
edges between pairs of nodes.
Disconnected nodes get the label DBL_MAX.
AdjacencyMatrix matrix a square matrix containing 1s for each
edge and zeros otherwise. The matrix is
symmetric. The diagonal is zeroed.
Methods: dimacs display Graph_type minus plus Rand select Tree
union XGMML
Synopsis: A graph is represented by
Graph( Edges( Edge(lab1,n1,n2), ... ), Nodes( lab1, lab2, ... ) )
Where Edges describes the set of edges and Nodes describes the set of nodes.
Alternatively, and only as input, graphs can be represented with the
standard notation of set of vertices and sets of edges. In this case an
edge is represented as a set of two vertices. A node (or vertex) can be
represented by any valid object in Darwin. Usually integers are used.
Notice that the values of Edge must correspond to a node, hence if you use
complicated objects as nodes, these have to be replicated every time you
include them in an Edge.
If Graph is used with only a set of Edges, it deduces which are the Nodes from
the Edges.
Examples:
> Graph({a,b,c},{{a,b},{a,c},{b,c}});
Graph(Edges(Edge(0,a,b),Edge(0,a,c),Edge(0,b,c)),Nodes(a,b,c))
> Graph(Edges(Edge(0,10,20),Edge(0,10,35)), Nodes(0,10,20,35));
Graph(Edges(Edge(0,10,20),Edge(0,10,35)),Nodes(0,10,20,35))
> Graph(Edges(Edge(0,10,20),Edge(0,10,35)));
Graph(Edges(Edge(0,10,20),Edge(0,10,35)),Nodes(10,20,35))
See Also:
?BipartiteGraph ?Graph_Rand ?ParseDimacsGraph
?Clique ?Graph_XGMML ?Path
?DrawGraph ?InduceGraph ?RegularGraph
?Edge ?MaxCut ?ShortestPath
?EdgeComplement ?MaxEdgeWeightClique ?TetrahedronGraph
?Edges ?MinCut ?VertexCover
?FindConnectedComponents ?MST
?Graph_minus ?Nodes
Graph_Rand
Function Graph_Rand - generate a random graph
Calling Sequence: Rand(Graph)
Graph_Rand(n,m)
Parameters:
Name Type Description
--------------------------------------------------
n integer optional number of nodes/vertices
m integer optional number of edges
Returns:
Graph
Synopsis: Generate a random undirected graph with n nodes and m edges. If m
is not specified, then the number of edges is <= n*ln(n). If n is not
specified, a random value between 5 and 20 is chosen. The Edges are all
labelled with 0.
Examples:
> Rand(Graph);
Graph(Edges(Edge(0,1,2),Edge(0,1,3),Edge(0,1,9),Edge(0,1,10),Edge(0,1,13),Edge(0,1,15),Edge(0,2,4),Edge(0,2,7),Edge(0,2,8),Edge(0,2,9),Edge(0,2,10),Edge(0,2,13),Edge(0,2,14),Edge(0,2,15),Edge(0,2,16),Edge(0,3,5),Edge(0,3,10),Edge(0,3,11),Edge(0,3,15),Edge(0,3,16),Edge(0,4,8),Edge(0,4,9),Edge(0,4,11),Edge(0,4,13),Edge(0,4,14),Edge(0,5,7),Edge(0,5,8),Edge(0,5,11),Edge(0,6,8),Edge(0,6,10),Edge(0,6,12),Edge(0,7,8),Edge(0,7,12),Edge(0,7,15),Edge(0,8,14),Edge(0,9,12),Edge(0,9,14),Edge(0,9,16),Edge(0,10,11),Edge(0,11,12),Edge(0,11,13),Edge(0,11,15),Edge(0,12,14),Edge(0,13,15),Edge(0,14,15)),Nodes(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16))
> Graph_Rand(3,4);
Graph(Edges(Edge(0,1,2),Edge(0,1,3),Edge(0,2,3)),Nodes(1,2,3))
See Also:
?BipartiteGraph ?Graph_minus ?ParseDimacsGraph
?Clique ?Graph_XGMML ?Path
?DrawGraph ?InduceGraph ?RegularGraph
?Edge ?MaxCut ?ShortestPath
?EdgeComplement ?MaxEdgeWeightClique ?TetrahedronGraph
?Edges ?MinCut ?VertexCover
?FindConnectedComponents ?MST
?Graph ?Nodes
G:=Graph(Edges(Edge(1,A,B),Edge(2,B,C)),Nodes(A,B,C));
G := Graph(Edges(Edge(1,A,B),Edge(2,B,C)),Nodes(A,B,C))
> print(Graph_XGMML(G));
See Also:
?BipartiteGraph ?Graph_minus ?ParseDimacsGraph
?Clique ?Graph_Rand ?Path
?DrawGraph ?InduceGraph ?RegularGraph
?Edge ?MaxCut ?ShortestPath
?EdgeComplement ?MaxEdgeWeightClique ?TetrahedronGraph
?Edges ?MinCut ?VertexCover
?FindConnectedComponents ?MST
?Graph ?Nodes
Graph_minus
Function Graph_minus
Calling Sequence: Graph_minus(G,V)
Graph_minus(G,E)
Graph_minus(G,G1)
Parameters:
Name Type Description
-------------------------------------
G Graph a given Graph
V Nodes Nodes to be removed
E Edges Edges to be removed
G1 Graph Subgraph to be removed
Returns:
Graph
Synopsis: This function removes a set of either edges or vertices from a
graph and returns the updated graph. Note that the deletion of an edge does
not remove the vertex end points of this edge. The deletion of a vertex
removes all incident edges.
Examples:
> G1 := Graph(Edges(Edge('a',1,2), Edge('b',2,3), Edge('c',3,4)), Nodes(1,2,3,4,5));
G1 := Graph(Edges(Edge(a,1,2),Edge(b,2,3),Edge(c,3,4)),Nodes(1,2,3,4,5))
> Graph_minus(G1, Edges(Edge('b',2,3)));
Graph(Edges(Edge(a,1,2),Edge(c,3,4)),Nodes(1,2,3,4,5))
> G1 minus Nodes(2,3);
Graph(Edges(),Nodes(1,4,5))
> Graph_minus(Graph(Edges(), Nodes(1,2,3,4)), Nodes(1,2,3,4));
Graph(Edges(),Nodes())
See Also:
?BipartiteGraph ?Graph_Rand ?ParseDimacsGraph
?Clique ?Graph_XGMML ?Path
?DrawGraph ?InduceGraph ?RegularGraph
?Edge ?MaxCut ?ShortestPath
?EdgeComplement ?MaxEdgeWeightClique ?TetrahedronGraph
?Edges ?MinCut ?VertexCover
?FindConnectedComponents ?MST
?Graph ?Nodes
HTMLColor
Function HTMLColor( what, color )
Converts any text into html format with color information
what: any text
color: color keyword, at least 2 (3 for blue and black) letters
e.g. "ye" (yellow), "light blue", "pink", "li gr" etc
HTMLColorprint
Function HTMLColorprint( what:anything, Directory:string, Filename:string, Positions:array, Colors:array(string), index:integer, DP0 )
Prints what either into a file (if specified) or just returns the string
in html format.
what: MultiAlign similar as print(), but creates a postscript file
for the tree in the same directory, same filename
with .ps extension
string adds to the text and prints it
Filename: Use NO extension. ".html" is added automatically
HTMLCols
Function HTMLCols( what:array(string), border:integer )
Puts each element of the array into different columns.
border: thickness of border (0 => invisible)
HTMLRows
Function HTMLRows( what:array(string), border:integer )
Puts each element of the array into different rows.
border: thickness of border (0 => invisible)
HTMLTitle
Function HTMLTitle( what:string, how:string )
Converts any text into html headings, bold or italiic text
what: any text
how: keyword, at least 1 letter
e.g. "H1" (Heading 1), "Heading 4", "bold", "it" etc.
HTMLprint
Function HTMLprint( what:anything, Directory:string, Filename:string )
Prints what either into a file (if specified) or just returns the string
in html format.
what: MultiAlign similar as print(), but creates a postscript file
for the tree in the same directory, same filename
with .ps extension
what: string adds to the text and prints it
Filename: Use NO extension. ".html" is added automatically
HammingSearchAllString
Function HammingSearchAllString - Find several approx instances of phrase in a
text
Calling Sequence: HammingSearchAllString(pat,txt)
Parameters:
Name Type Description
-----------------------------------------
pat string a pattern that is sought
txt string a text which is searched
dist integer an (opt.) hamming dist
Returns:
list
Synopsis: The function HammingSearchAllArray returns the array of indices of
an all the occurrences of the pattern in the text with in a hamming distance
(default 1). If pattern can not be found it returns an empty list. This
function is case insensitive.
Examples:
> HammingSearchAllString('cat', 'acgcatcatgcatcagtca');
[4, 7, 11, 14]
See Also:
?BestSearchString ?MatchRegex ?SearchMultipleString
?CaseSearchString ?SearchApproxString ?SearchString
?HammingSearchString ?SearchDelim
HammingSearchString
Function HammingSearchString
Option: builtin
Calling Sequence: HammingSearchString(pat,txt,tol)
Parameters:
Name Type
------------------
pat string
txt string
tol {0, posint}
Returns:
{-1,posint}
Synopsis: This function is almost identical to ApproxSearchString. The only
difference is that insertions and deletions are not allowed.
Examples:
> txt := 'AAAAAAAAAHeLLoBBBBB';
txt := AAAAAAAAAHeLLoBBBBB
> j := HammingSearchString('hallo', txt, 1);
j := 9
> j+txt;
HeLLoBBBBB
> HammingSearchString('aahllo', txt, 1);
-1
See Also:
?BestSearchString ?SearchApproxString ?SearchString
?CaseSearchString ?SearchDelim
?MatchRegex ?SearchMultipleString
History
Data structure History( )
Function: creates a datastructure to keep a history of what happened
Selectors:
Show: Prints the whole history
ID
Class ID - Data structure for storing IDs of the database DB
Template: ID(id)
Fields:
Name Type Description
------------------------------------------------------------------------
id {list,string,structure} ID(s) of Entries in the database DB
PatEntry, Match or Entry data structure
Returns:
ID
Methods: Entry ID_type Sequence
Synopsis: ID is a data structure which holds database identification tags
(IDs) contained in the and tags in a Darwin formatted database.
IDs can be used as arguments to other functions, e.g. Entry, Sequence, to
indicate that the Entry or sequence desired is the one with the given ID.
ID will attempt to convert its arguments when they are other entry
descriptions to IDs.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
> id := ID('100K_RAT');
id := ID(100K_RAT)
> Entry(id);
> Sequence(id);
> ID(Entry(2));
ID(108_LYCES)
> ID(PatEntry(10000..10002));
ID(SYP1_YEAST,SYP_CHLPN,SYQ_DEIRA)
> ID(Sequence(Entry(1)));
ID(104K_THEPA)
See Also:
?AC ?Match ?SearchAC ?Sequence ?Species_Entry
?Entry ?PatEntry ?SearchID ?SPCommonName ?SP_Species
IdenticalTrees
Function IdenticalTrees - test whether two trees have the same topology
Calling Sequence: IdenticalTrees(t1,t2)
Parameters:
Name Type
-----------
t1 Tree
t2 Tree
Returns:
boolean
Synopsis: IdenticalTrees tests whether the two given trees have the same
topology (shape, relation between the leaves). The branch lengths are
ignored. The trees must have leaves based on the same labels (first
argument of Leaf). If the set of leaf labels differs, IdenticalTrees will
return false.
Examples:
> t1 := Tree(Tree(Leaf(a,2),1,Leaf(b,2)),0,Tree(Leaf(c,2),1,Leaf(d,2))):
> t2 := Tree(Tree(Leaf(a,2),1,Leaf(d,2)),0,Tree(Leaf(c,2),1,Leaf(b,2))):
> IdenticalTrees(t1,t2);
false
See also: ?Leaf ?Tree
Identity
Function Identity - create an identity matrix
Calling Sequence: Identity(n)
Parameters:
Name Type Description
---------------------------------------
n posint dimension of the matrix
Returns:
matrix(integer)
Synopsis: Creates a new identity matrix of dimension n x n.
Examples:
> Identity(3);
[[1, 0, 0], [0, 1, 0], [0, 0, 1]]
See Also:
?Cholesky ?GaussElim ?matrix
?convolve ?GivensElim ?matrix_inverse
?Eigenvalues ?LinearProgramming ?transpose
If
Function If
Option: builtin
Calling Sequence: If(cond,exptrue,expfalse)
Parameters:
Name Type
-----------------------------
cond boolean expression
exptrue expression
expfalse expression
Returns:
{type(expfalse),type(exptrue)}
Synopsis: The If construct provides a short hand version of the if-then-fi
construct. Every If can be re-written as follows:
> if cond then exptrue else expfalse fi;
Note that the If function returns the result of exptrue or expfalse.
Examples:
> x:=5;
x := 5
> If(mod(x,2)=0, x/2, (x-1)/2);
2
InduceGraph
Function InduceGraph
Calling Sequence: InduceGraph(G,V)
InduceGraph(G,E)
Parameters:
Name Type Description
--------------------------------------
G Graph a given Graph
V Nodes Nodes inducing subgraph
E Edges Edges inducing subgraph
Returns:
Graph
Synopsis: This function computes a vertex- or edge-induced subgraph. A
vertex-induced subgraph is one that consists of some of the vertices of the
original graph and all of the edges that connect them in the original. An
edge-induced subgraph consists of some of the edges of the original graph
and the vertices that are at their endpoints.
Examples:
> G := Graph( {{1,2},{2,3},{1,3},{2,4}},{1,2,3,4} );
G := Graph(Edges(Edge(0,1,2),Edge(0,1,3),Edge(0,2,3),Edge(0,2,4)),Nodes(1,2,3,4))
> InduceGraph( G, Nodes(1,2,3) );
Graph(Edges(Edge(0,1,2),Edge(0,1,3),Edge(0,2,3)),Nodes(1,2,3))
> InduceGraph( G, Edges( Edge(0,1,2),Edge(0,2,3)) );
Graph(Edges(Edge(0,1,2),Edge(0,2,3)),Nodes(1,2,3))
See Also:
?BipartiteGraph ?Graph_minus ?ParseDimacsGraph
?Clique ?Graph_Rand ?Path
?DrawGraph ?Graph_XGMML ?RegularGraph
?Edge ?MaxCut ?ShortestPath
?EdgeComplement ?MaxEdgeWeightClique ?TetrahedronGraph
?Edges ?MinCut ?VertexCover
?FindConnectedComponents ?MST
?Graph ?Nodes
InfixNr
Function InfixNr( t:Tree )
returns all numbers of the leafs in a tree (or a leaf)
Warning: procedure Polar_abs reassigned
Inherit
Function Inherit - Inherit all defined methods of the old into the new class
Calling Sequence: Inherit(newclass,oldclass)
Parameters:
Name Type Description
--------------------------------------------------
newclass symbol The new class being extended
oldclass symbol The class donating the methods
Returns:
NULL
Synopsis: All methods defined for the oldclass which are not defined in the
newclass are converted to work with the newclass. Any method which is not
wanted to be inherited must be defined before calling Inherit.
Alternatively, an unwanted method can be removed by using noeval. Multiple
inheritance is obtained by invoking Inherit more than once. Inherit
benefits from the availability of newclass_Rand, if the objects have some
special property. In general, it is a good idea to define all the methods
which are particular to newclass before invoking Inherit. Note: Since the
newclass is not a subclass of the oldclass (but only convertible) objects of
type newclass are not of type oldclass and a corresponding test with the
function "type" results in "false".
Examples:
> Polar := proc( Rho:numeric, Theta:numeric ) ... end;
> Polar_abs := proc( a:Polar ) a[Rho] end;
Polar_abs := proc (a:Polar) a[Rho] end
> Inherit(Polar,Complex);
See also: ?CompleteClass ?ExtendClass ?objectorientation ?Protect
IntOut
Function IntOut( IntMatrix:array(array(array)), IntMatrixTot:array(array) )
Returns for each position the IntProb of being interior,
the size of the largest APC subgroup at the specified MaxPW and IntAA used
to determine IntProb
IntToA
Function IntToA - convert an integer into a 1 letter amino-acid name
Option: builtin
Calling Sequence: IntToA(x)
Parameters:
Name Type Description
----------------------------------------
x integer an integer from 1 to 20
Returns:
string
Synopsis: This function converts a posint into a one letter abbreviation of
an amino acid. This follows the standard ordering of amino acids. (See
?aminoacids)
Examples:
> IntToA(20);
V
See Also:
?AAAToInt ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToCInt
?aminoacids ?BBBToInt ?CIntToInt ?IntToAAA ?IntToCodon
?AminoToInt ?BToInt ?CodonCode ?IntToAmino
?AToCInt ?CIntToA ?CodonToA ?IntToB
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase
?AToInt ?CIntToAmino ?CodonToInt ?IntToBBB
IntToAAA
Function IntToAAA - convert an integer into a 3 letter amino-acid name
Option: builtin
Calling Sequence: IntToAAA(x)
Parameters:
Name Type Description
----------------------------------------
x integer an integer from 1 to 20
Returns:
string
Synopsis: This function converts a posint into a three letter abbreviation of
an amino acid. This follows the standard ordering of amino acids. (See
?aminoacids)
Examples:
> IntToAAA(1);
Ala
See Also:
?AAAToInt ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToCInt
?aminoacids ?BBBToInt ?CIntToInt ?IntToA ?IntToCodon
?AminoToInt ?BToInt ?CodonCode ?IntToAmino
?AToCInt ?CIntToA ?CodonToA ?IntToB
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase
?AToInt ?CIntToAmino ?CodonToInt ?IntToBBB
IntToAmino
Function IntToAmino - convert an integer into an amino-acid name
Option: builtin
Calling Sequence: IntToAmino(x)
Parameters:
Name Type Description
----------------------------------------
x integer an integer from 1 to 20
Returns:
string
Synopsis: This function converts a posint into the full name for an amino
acid following the standard ordering of amino acids. (See ?aminoacids)
Examples:
> IntToAmino(15);
Proline
See Also:
?AAAToInt ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToCInt
?aminoacids ?BBBToInt ?CIntToInt ?IntToA ?IntToCodon
?AminoToInt ?BToInt ?CodonCode ?IntToAAA
?AToCInt ?CIntToA ?CodonToA ?IntToB
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase
?AToInt ?CIntToAmino ?CodonToInt ?IntToBBB
IntToAscii
Function IntToAscii - convert an integer to its ascii ordinal character
Option: builtin
Calling Sequence: IntToAscii(i)
Parameters:
Name Type Description
--------------------------------------------
i posint an integer between 1 and 255
Returns:
string
Synopsis: Converts an integer between 1 and 255 to its ascii ordinal
character. The null character (octal 000) cannot be represented. This
function allows an easy way to generate non-printable characters, or special
(accentuated) characters. This is useful when encoding/decoding symbols for
dynamic programming. It is also useful in general for the analysis of raw
input.
Examples:
> IntToAscii(97);
a
> IntToAscii(126);
~
See Also:
?AsciiToInt ?HammingSearchString ?SearchDelim
?AToInt ?IntToA ?SearchMultipleString
?BestSearchString ?MatchRegex ?SearchString
?CaseSearchString ?SearchApproxString
IntToB
Function IntToB - Integer to One Letter Nucleic
Option: builtin
Calling Sequence: IntToB(x)
Parameters:
Name Type
-------------
x {1..6}
Returns:
{A,C,G,T,U,X}
Synopsis: This function converts an integer between 1..6 into the one letter
code for nucleic acids A, C, G, T, U, X.
Examples:
> IntToB(1);
A
> IntToB(6);
X
See Also:
?AAAToInt ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToCInt
?aminoacids ?BBBToInt ?CIntToInt ?IntToA ?IntToCodon
?AminoToInt ?BToInt ?CodonCode ?IntToAAA
?AToCInt ?CIntToA ?CodonToA ?IntToAmino
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase
?AToInt ?CIntToAmino ?CodonToInt ?IntToBBB
IntToBBB
Function IntToBBB - Integer to Three Letter Nucleic
Option: builtin
Calling Sequence: IntToBBB(x)
Parameters:
Name Type
-------------
x {1..5}
Returns:
{Ade,Cyt,Gua,Thy,Ura}
Synopsis: This function converts an integer between 1..5 into the three
letter code for nucleic acids Ade, Cyt, Gua, Thy, Ura respectively.
Examples:
> IntToBBB(1);
Ade
See Also:
?AAAToInt ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToCInt
?aminoacids ?BBBToInt ?CIntToInt ?IntToA ?IntToCodon
?AminoToInt ?BToInt ?CodonCode ?IntToAAA
?AToCInt ?CIntToA ?CodonToA ?IntToAmino
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToB
?AToInt ?CIntToAmino ?CodonToInt ?IntToBase
IntToBase
Function IntToBase - Integer to Nucleic Acid Name
Option: builtin
Calling Sequence: IntToBase(x)
Parameters:
Name Type
-------------
x {1..5}
Returns:
{Adenine,Cytosine,Guanine,Thymine,Uracil}
Synopsis: This function converts an integer between 1..5 into the full name
for a nucleic acid Adenine, Cytosine, Guanine, Thymine, Uracil respectively.
Examples:
> IntToBase(1);
Adenine
See Also:
?AAAToInt ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToCInt
?aminoacids ?BBBToInt ?CIntToInt ?IntToA ?IntToCodon
?AminoToInt ?BToInt ?CodonCode ?IntToAAA
?AToCInt ?CIntToA ?CodonToA ?IntToAmino
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToB
?AToInt ?CIntToAmino ?CodonToInt ?IntToBBB
IntToCInt
Function IntToCInt - Amino Acid Integer to List of Codon Integers
Calling Sequence: IntToCInt(AA)
Parameters:
Name Type Description
----------------------------------
AA posint amino acid integer
Returns:
list
Synopsis: This function converts an amino acid integer code into a list of
the corresponding codon integers. It will convert the symbol for a stop
codon '$' into a list of stop codons.
Examples:
> IntToCInt('$');
[49, 51, 57]
> IntToCInt(4);
[34, 36]
See Also:
?AAAToInt ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToBBB
?aminoacids ?BBBToInt ?CIntToInt ?IntToA ?IntToCodon
?AminoToInt ?BToInt ?CodonCode ?IntToAAA
?AToCInt ?CIntToA ?CodonToA ?IntToAmino
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToB
?AToInt ?CIntToAmino ?CodonToInt ?IntToBase
IntToCodon
Function IntToCodon - Integer Amino Acid Representation to List of Codons
Calling Sequence: IntToCodon(AA)
Parameters:
Name Type Description
----------------------------------------
AA integer amino acid integer code
Returns:
list
Synopsis: This function converts an amino acid integer code (see ?aminoacids)
into a list of the corresponding codons. The amino acid integer code for
the stop codons is 22.
Examples:
> IntToCodon(22);
[TAA, TAG, TGA]
> IntToCodon(5);
[TGC, TGT]
See Also:
?AAAToInt ?BaseToInt ?CIntToCodon ?GeneticCode ?IntToBBB
?aminoacids ?BBBToInt ?CIntToInt ?IntToA ?IntToCInt
?AminoToInt ?BToInt ?CodonCode ?IntToAAA
?AToCInt ?CIntToA ?CodonToA ?IntToAmino
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToB
?AToInt ?CIntToAmino ?CodonToInt ?IntToBase
Interior
Function Interior( Cluster:list(list(list)), MA:array(string), MaxPW:array, IntAA:array, ActMatrixOut:array )
Reports the length of the largest subgroup at defined PAM
windows in which all amino acids are of the types defined in IntAA
InteriorTot
Function InteriorTot( IntMatrix:array(array(array)) )
Reports the sum of the length of all the largest subgroups at
defined PAM windows and IntAAs counted over all positions
IntraDistance
Function IntraDistance - Computes the pairwise distances between trees in a
list
Calling Sequence: IntraDistance(Trees,DistanceFunction)
Parameters:
Name Type Description
----------------------------------------------------------------------
Trees list(Tree) list of trees
DistanceFunction procedure (optional), distance between two trees
Returns:
table
Synopsis: IntraDistance computes the distances between every pair of trees in
the given list over the set of common leaves. That is, each pair of trees
is first reduced to the subtrees of the common leaves and then the distance
is computed. If there are less than 4 common leaves, the pair is ignored as
the distance will be always 0. IntraDistance returns a table which contains
the first three moments (0th, first and second) of the distance distribution
per size of the intersecting leaves. That is to say, if r is the result,
then r[4] will be a list of 3 values, which are the 0th, 1st and 2nd moment
of the distribution of distances for trees which shared exactly 4 leaves.
If no DistanceFunction is provided, the Robinson-Foulds distance will be used.
If a branch length of a tree is less or equal to MinLen, then it is assumed
that this branch does not exists, i.e. this is a case of multifurcation
rather than bifurcation and the corresponding edge will not be counted in
the distance. This is a difference from the RobinsonFoulds distance and it
allows to compute distances to trees with partial information, like trees
derived from taxonomic data.
Examples:
> st1 := Tree(Leaf(a,2),1,Leaf(b,2)):
> st2 := Tree(Leaf(c,2),1,Leaf(d,2)):
> st3 := Tree(Leaf(e,2),0.5,st2):
> st4 := Tree(Leaf(a,2),0.5,Tree(Leaf(e,2),1,Leaf(b,2))):
> r := IntraDistance( [Tree(st1,0,st2),Tree(st3,0,st1),Tree(st4,0,st2)] ):
> print(r);
4 --> [2, 0, 0]
5 --> [1, 1, 1]
See Also:
?BipartiteSquared ?LeastSquaresTree ?RobinsonFoulds
?BootstrapTree ?PhylogeneticTree ?SignedSynteny
?ComputeDimensionlessFit ?RBFS_Tree ?Synteny
?GapTree ?ReconcileTree
Intron
Class Intron
Template: Intron(n,pam,div)
Fields:
Name Type Description
-------------------------------------
n string nucleotide sequence
pam numeric PAM distance
div string code of the division
Returns:
Intron
Methods: Intron_type
Global Variables: IT_model IT_olddiv IT_oldn IT_oldres IT_scores
Synopsis: Computes and stores the Bayesian probabilistic intron scoring
model. Use Intron(div) to select the scoring model for division div.
Divisions are fun, inv, mam, pln, pri, pro, rod, vrt, any.
Examples:
See also:
IntronModel
Class IntronModel
Template: IntronModel(Donor,InIntron,Acceptor,MinLen)
Fields:
Name Type Description
-----------------------------------
Donor GramSite
InIntron GramRegion
Acceptor GramSite
MinLen posint
Donor GramSite
InIntron GramRegion
Acceptor GramSite
MinLen posint
Returns:
IntronModel
Methods: IntronModel_type print select
Synopsis: Structure to hold intron scoring model data.
See also: ?LinearIntron
IsolationIndex
Function IsolationIndex( d:matrix(numeric), I:set )
Computes isolation index for the split [I, {1..length(d)} minus I].
KHTest
Function KHTest - Runs KH test on two tree topologies over a MAlignment.
Calling Sequence: KHTest(msa,t1,t2)
Parameters:
Name Type Description
------------------------------------------------------------------------
msa MAlignment Multiple sequence alignment
t1 Tree First tree
t2 Tree Second tree
method string (optional) BS; RELL; CONV (default)
subst string (optional) Substitution model for PhyML (LG)
nrOfBootraps posint (optional) Number of bootstraps (100)
sigLevel numeric (optional) Significance level
Returns:
boolean
Synopsis: Run KH test on two tree topologies over a MAlignment and return
whether the null hypothesis is rejected or not. Tree topologies are kept
fixed during resampling. KHTest returns true if null hypothesis is rejected,
false otherwise. PhyML is employed to do likelihood maximization and must be
installed in order to use this function. KHTest uses either a convolution
(default), RELL or bootstrap.
References: Goldman N., Anderson J.P., Rodrigo A.G. Likelihood-Based Tests of
Topologies in Phylogenetics, Systematic Biology, 49:652-670, 2000
Examples:
> ReadProgram('datasets/quartet1/trees.drw');;
> msa := ReadFastaIntoMAlignment('datasets/quartet1/MSA_1.fa');;
> lprint('BootStrap', KHTest(msa,tree1,tree2,method='BS'));;
> lprint('RELL', KHTest(msa,tree1,tree2,method='RELL'));;
KWIndex
Function KWIndex - Compute the Kabat-Wu Variation Index
Calling Sequence: KWIndex(ma)
Parameters:
Name Type Description
--------------------------------------------------
ma array(string) multiple sequence alignment
Returns:
list(numeric)
Synopsis: Computes the Kabat-Wu variation index for all positions of a
multiple alignment.
References: T.T. Wu, E.A. Kabat: An analysis of the sequences of the variable
regions of Bence Jones proteins and myeloma light chains and their
implications for antibody complementarity. J. Exp. Med. 132(1970): 211-250.
Examples:
> ma := [
' -------------------------FPE',
' - ..(295).. LQCVKYYYV'];
ma := [ -------------------------FPE, -------------------IASAGFVRD, AKQVVLLIFGSWQLARERLANEMRKAVAY__T, AEPIVPLLFGMWRLKRKKANNKLLRCVKY__T, AEVIVPLLFGVWRLKREERTYTLLQCVKY__V, AEPIVPLLFGLWQLAREKASNTLLQCVKY__V, EPIVPLL__MWQLAIEKSSNTLLQCVK__KV, PIVPLLFGMWQLAREKASNTLLQCVKYYYV]
> kwxd := KWIndex (ma);
kwxd := [1, 2.5000, 4.5000, 2.4000, 1, 2.4000, 1, 2.4000, 1, 1, 8, 1, 3, 1, 3, 2.4000, 2.4000, 4.5000, 8, 8, 2.4000, 4.5000, 2.4000, 4.2000, 7, 4.2000, 2.3333, 4.2000, 2.4000, 9, 16, 8]
See also: ?PlotIndex ?PrintIndex ?ProbIndex ?ScaleIndex
LSBestDelete
Function LSBestDelete( AtA:matrix(numeric), btA:list(numeric), btb:numeric )
Least Squares approximation removing the least significant variable.
LSBestDelete finds the least significant independent variable to remove.
This variable is least significant in the sense that increases the norm
of the residuals by the least amount. This is the reverse process of
Stepwise regression, where we start with all the independent variables
and remove the one with the least norm increase at a time.
Problem: Given a matrix of A (dim n x m) and a vector b (dim n), we want
to find a vector x (dim m) such that Ax ~ b, where x has one entry
which is zero. This approximation is in the least squares sense,
i.e. ||Ax-b||^2 is minimum.
The calling arguments are:
AtA is a matrix (dim m x m) which is the product A^t * A
btA is a vector (dim m) which is the product b^t * A
btb is the norm squared of b, i.e. b^t * b
Output: The output is a list with two values: [i,norm], where
i is the index of the variable removed
norm is the value of the norm of the residuals without this variable
i.e. norm = ||Ax-b||^2
See Also: ?LSBestSum ?LSBestSumDelete
LSBestSum
Function LSBestSum( AtA:matrix(numeric), btA:list(numeric), btb:numeric )
Least Squares approximation using the best sum of independent variables.
LSBestSum finds the best pair of variables which can be replaced by
their sum. This pair is best in the sense of increasing the norm of
the residuals by the least amount.
Problem: Given a matrix of A (dim n x m) and a vector b (dim n), we want
to find a vector x (dim m) such that Ax ~ b, where x has two values
which are identical. This approximation is in the least squares
sense, i.e. ||Ax-b||^2 is minimum.
The calling arguments are:
AtA is a matrix (dim m x m) which is the product A^t * A
btA is a vector (dim m) which is the product b^t * A
btb is the norm squared of b, i.e. b^t * b
Output: The output is a list with three values: [i,j,norm], where
i and j are integers and are the indices of the variables which
are replaced by their sum.
norm is the value of the norm of the residuals with this sum,
i.e. norm = ||Ax-b||^2
See Also: ?LSBestSumDelete ?LSBestDelete
LSBestSumDelete
Function LSBestSumDelete( AtA:matrix(numeric), btA:list(numeric), btb:numeric )
Least Squares approximation using the best sum of independent variables
or best deleted variable.
LSBestDelete finds the best pair of variables which can be replaced by
their sum or the best variable that can be removed. This is best in the
sense of increasing the norm of the residuals by the least amount.
This function does the work of both LSBestSum and LSBestDelete in one
pass.
Problem: Given a matrix of A (dim n x m) and a vector b (dim n), we want
to find a vector x (dim m) such that Ax ~ b, where x has two values
which are identical or one value which is zero. This approximation
is in the least squares sense, i.e. ||Ax-b||^2 is minimum.
The calling arguments are:
AtA is a matrix (dim m x m) which is the product A^t * A
btA is a vector (dim m) which is the product b^t * A
btb is the norm squared of b, i.e. b^t * b
Output: The output is a list with three values: [i,j,norm], where
i and j are integers and are the indices of the variables which
are replaced by their sum. If i=0 then j is the variable to
be removed.
norm is the value of the resulting norm of the residuals,
i.e. norm = ||Ax-b||^2
See Also: ?LSBestSum ?LSBestDelete
Leaf
Class Leaf - external node for binary Tree
Template: Leaf(Label)
Leaf(Label,Height)
Fields:
Name Type Description
-----------------------------------
Label anything optional label
Height numeric optional height
Returns:
Leaf
Methods: Leaf_type
Synopsis: The Leaf structure holds the information associated with the leaf
of a tree (Tree structure). The format is generally unspecified allowing
Leaf structures containing anything. However, most phylogenetic tree
construction algorithms in Darwin assume that a leaf label is contained in
the first position and the height information is contained in the second
position. Type testing for Tree will also yield true for a Leaf so that
recursive trees with Leaf() nodes are easy to code. If additional
information needs to be stored in the Leaf, the Leaf class can be extended
with ExtendClass. Alternatively, extra arguments to Leaf will be left
undisturbed.
Examples:
> t:=Tree(Tree(Leaf(A),5,Leaf(B)),0,Tree(Leaf(C),11,Leaf(D)));
t := Tree(Tree(Leaf(A),5,Leaf(B)),0,Tree(Leaf(C),11,Leaf(D)))
> t[Left, Left, Label];
A
See also: ?DrawTree ?ExtendClass ?Infix ?Leaves ?Postfix ?Prefix ?Tree
LeastSquaresTree
Function LeastSquaresTree - compute a distance phylogenetic tree using least
squares
Option: builtin
Calling Sequence: LeastSquaresTree(Dist,Var)
LeastSquaresTree(Dist,Var,Labels)
LeastSquaresTree(Dist,Var,Labels,IniTree,Keep)
Parameters:
Name Type Description
---------------------------------------------------------------------------------
Dist matrix(numeric) Pairwise distances
Var matrix(numeric) Variances
Labels list Optional labels for the leaves
IniTree Tree Initial tree to optimize its branch lengths
IniTree 'Random' To start with a completely random tree
IniTree 'NJRandom' To start with a random Neighbour-joining like tree
IniTree 'Trials' = posint Run n trials with NJRandom and return the best tree
Keep 'KeepTopology' (optional) Optimize branch lengths only
Returns:
Tree
Synopsis: This function computes a binary tree which approximates the given
distances Dist by least squares. The distances are assumed to have a
variance given by the matrix Var. If a list Labels is given, the leaf of
the resulting trees are labelled with these values. The Leaf nodes produced
have 3 fields: (1) the label given (or their integer index if no Labels are
given), (2) the height of the Leaf and (3) their integer index. If the
global variable MinLen is assigned a positive value, it will determine the
minimum branch length. If not set, 1/1000th of the average distance between
leaves is used. The quality of the fit is measured by the sum of the
squares of the weighted deviations divided by (n-2)(n-3)/2. This value is
stored in the global variable MST_Qual. A dimensionless fitting index is
also computed, it is the MST_Qual / variance(Dist) * harmonic_mean(Var).
This value is printed and stored in the global variable DimensionlessFit.
Trees built over the same set of species, even with radically different
methods, can be ranked by the quality of their fit with this index. If the
fourth parameter has a Tree, then this tree is taken and optimized.
If the fourth argument is the word "Random", then the optimization is started
over a random tree. For large trees it makes sense to try several random
trees and choose the one with the best MST_Qual. When starting with random
trees, the global variable MST_Prob can be set to any numerical value
between 0 and 1. Values close to 1 select trees which are very close to the
one given by Neighbour Joining. Values close to 0 select completely random
trees. Leaving MST_Prob unassigned is equivalent to using NJRandom.
When "NJRandom" is used, a Neighbour-joining like tree is make with a variable
level of randomness at each step which may produce better random trees.
When the word KeepTopology is used, the optimization is done only on the
branch lengths. This is useful to optimize the branches of a given tree.
The function Tree_matrix extracts the distance matrix from a tree. It is sort
of the inverse of LeastSquaresTree.
Examples:
> D := [[0, 3, 13, 10], [3, 0, 14, 11], [13, 14, 0, 9], [10, 11, 9, 0]];
D := [[0, 3, 13, 10], [3, 0, 14, 11], [13, 14, 0, 9], [10, 11, 9, 0]]
> V := [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]];
V := [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]
> LeastSquaresTree(D, V);
dimensionless fitting index 0
> t := LeastSquaresTree(D, V, [AA, BB, CC, DD]);
dimensionless fitting index 0
> print(Tree_matrix(t));
0 3 13 10
3 0 14 11
13 14 0 9
10 11 9 0
See Also:
?BootstrapTree ?Leaf ?Synteny
?ComputeDimensionlessFit ?PhylogeneticTree ?Tree
?DrawTree ?RBFS_Tree ?Tree_matrix
?GapTree ?SignedSynteny ?ViewPlot
LinearClassification
Class LinearClassification - results of a linear classification
Template: LinearClassification(X,X0,WeightPos,WeightNeg,NumberPos,NumberNeg,
WeightedFalses,HighestNeg,LowestPos)
Fields:
Name Type Description
----------------------------------------------------------------------
X list(numeric) solution vector
X0 numeric threshold value
WeightPos numeric weight of the positives
WeightNeg numeric weight of the negatives
NumberPos posint number of positives
NumberNeg posint number of negatives
WeightedFalses numeric weighted misclassifications
HighestNeg list([posint, numeric]) highest scoring negatives
LowestPos list([posint, numeric]) lowest scoring positives
Returns:
LinearClassification
Methods: DirOpt2 DirOpt3 DirOpt4 LinearClassification_type print
refine
Synopsis: Data structure which holds the result of a linear classification.
A linear classification is defined by a vector X, such that the internal
product of every data point A[i] with X can be compared against a threshold
and decide whether the data point is a positive or a negative. I.e. A[i] X
< X0 implies a negative and A[i] X >= X0 a positive.
See also: ?LinearClassify
LinearClassify
Function LinearClassify - Linear form which does pos/neg classification
Calling Sequence: LinearClassify(A,accept,mode,WeightNeg)
Parameters:
Name Type Description
-----------------------------------------------------------------------------
A matrix(numeric) an n x m matrix of independent variables
accept {list,procedure} positive/negative determination
mode anything optional, algorithm selection, defaults to Svd
WeightNeg positive optional, weight of negatives
Returns:
LinearClassification
Global Variables: BestLinearClassifications ComputeSensitivity
Synopsis: Computes a vector X such that the values A[i]*X >= X0 classify the
positive/negatives. Such a vector is called a linear discriminant in
statistics. A special way of computing such a vector is called the Fisher
linear discriminant. LinearClassify normally produces results which are
much better than the Fisher linear discriminant. LinearClassify returns a
data structure called LinearClassification which contains the vector X and
the splitting value, called X0, the weights of positive and negatives, the
score obtained and the worst misclassifications. The third argument, if
present, directs the function to use a particular method of computation.
The methods are
mode description
-------------------------------------------------------------------------
BestBasis equivalent to BestBasis(10)
BestBasis(k) Least Squares to 0-1 using SvdBestBasis of size k
Svd (default mode) equivalent to Svd(1e-5)
Svd(bound) Least Squares to 0-1 using Svd, with svmin=bound
Svd(First(k)) LS to 0-1 using Svd so that k sing values are used
CenterMass Direction between the pos/neg center of masses
Variance Variance Discrimination of each variable
Fisher Fisher linear discriminant
Logistic Steepest descent optimization using the logistic function
CrossEntropy Steepest descent optimization using cross entropy
Best equivalent to Best(10)
Best(n) A combination of methods found most effective by
experimentation. n determines the amount of optimization.
Svd(1e-5)
In practice the best results are obtained with Best(n). Svd(1e-12) and
CrossEntropy are also very effective. Once a LinearClassification has been
computed, it can be improved or refined with the functions:
function description
--------------------------------------------------------------------
LinearClassification_refine find the center of the min in each dim
LinearClassification_refine2 Svd applied to a hyperswath
LinearClassification_refine3 Minimize in a random direction
LinearClassification_refine4 Svd on progressively smaller swaths
Unless you use the Best option, no matter how good the initial results are, it
always pays to do some refinement steps. In particular refine2 and refine4
give very good refinements.
Examples:
> A := [[0,3], [8,5], [10,7], [5,5], [7,4], [7,9]]:
> lc := LinearClassify( A, [0,1,1,0,1,0] ):
> print(lc);
solution vector is X = [0.1945, -0.1364]
discriminator is A[i] * X > 0.553293
6 data points, 3 positive, 3 negative
positives weigh 1, negatives 1, overall misclassifications 0
Highest negative scores: []
Lowest positive scores: []
See Also:
?LinearClassification ?Stat ?SvdBestBasis
?LinearRegression ?SvdAnalysis
LinearIntron
Class LinearIntron
Template: LinearIntron(n,pam,minlen,F,I)
Fields:
Name Type
----------------------------
n nucleotide sequence
pam numeric
minlen integer
F numeric
I numeric
Returns:
NULL
Methods: LinearIntron_type
Global Variables: LI_oldF LI_oldI LI_oldlen LI_oldn LI_oldres
Synopsis: Computes and stores the general linear intron scoring model. Use
LinearIntron(minlen, F, I) to score F + (len - 1) * I for any subsequence of
length len >= minlen fulfilling the GT-AG rule.
See also: ?IntronModel
LinearProgramming
Function LinearProgramming - Solves a linear optimization problem
Calling Sequence: LinearProgramming(A,b,c)
Parameters:
Name Type Description
----------------------------------------------------------------------------------
A matrix(numeric) Matrix of LHS coefficients
b list(numeric) Vector of RHS coefficients
c {Feasibility,list(numeric)} Vector of coefficients for objective function
Returns:
[list(numeric), set(posint)] : where the first element is the solution and the second is the set of indices to rows of A which define the corner x
SimplexHasNoSolution : when there is no solution
SimplexIsSingular : when it cannot find a subset of rows from A which is non-singular
UnboundedSolution(x,d) : where x + h*d, is a solution for any h>=0 and c*(x+h*d) grows unboundedly
Synopsis: LinearProgramming( A, b, c ) solves the problem of finding a vector
x such that Ax >= b and c*x is maximum.
This is the unconstrained problem, the variables in x can be positive or
negative, for the classical problem, x >= 0, these conditions have to be
stated explicitly.
If c is 'Feasibility' LinearProgramming will only attempt to find a feasible
solution, which is returned and do no optimization. This saves computation.
Examples:
> A := [[-1, -1.5000], [-2, -1], [1, 0], [0, 1]];
> b := [-750, -1000, 0, 0];
> c := [50, 40];
> LinearProgramming(A,b,c);;
See Also:
?Cholesky ?EvolutionaryOptimization ?Identity ?SvdAnalysis
?convolve ?GaussElim ?matrix ?transpose
?Eigenvalues ?GivensElim ?matrix_inverse
LinearRegression
Function LinearRegression - Compute a linear regression
Calling Sequence: LinearRegerssion(y,x1,...)
Parameters:
Name Type Description
----------------------------------------------------------
y array(numeric) dependent variable
y table dependent data are values in table
x1 array(numeric) independent variable(s)
Returns:
array(numeric)
Global Variables: SumSq
Synopsis: Computes a linear regression y = a0 + a1*x1 + a2*x2 + ... by least
squares. The number of arguments is variable, it should be at least 2.
LinearRegression returns the vector [a0,a1,a2,...]. The global variable
SumSq is set to the sum of squares of errors in the regression.
Alternatively, if only one argument is provided, and it is a table, the
regression will be made as if the table values were the dependent variable
and the table arguments were the independent variable(s). Hence the
arguments of the table must be either numbers or lists of numbers,
consistently.
Examples:
> LinearRegression( [2.1,3.01,3.9,4.89], [0,1,2,3] );
[2.0860, 0.9260]
> SumSq;
0.00232000
See also: ?ExpFit ?ExpFit2 ?Stat ?SvdAnalysis
LnGamma
Function LnGamma - logarithm of the Gamma and Incomplete Gamma functions
Calling Sequence: LnGamma(a)
LnGamma(a,x)
Parameters:
Name Type Description
-----------------------------------------------------------------------------
a numeric a numerical value
x nonnegative a nonnegative argument for the Incomplete Gamma function
Returns:
numeric
Synopsis: For a positive integer a, LnGamma returns the logarithm of the
product of 1*2*3*...*(a-1) = ln( (a-1)! ). LnGamma satisfies the functional
equation:
LnGamma(a+1) = ln(a) + LnGamma(a) = ln(Gamma(a+1))
For non-integer negative values, LnGamma returns the logarithm of the absolute
value of Gamma. LnGamma is used to compute factorials or combinatorial
numbers when the results are too large to be represented as floating point
numbers. LnGamma will compute results for virtually all possible arguments.
When Gamma is used with two arguments, it is understood to be the Incomplete
Gamma function, defined by the integral:
infinity
/
| (a - 1)
LnGamma(a, x) = ln( | t exp(-t) dt)
|
/
x
References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun,
Ch 6.1, 6.5.3
Examples:
> LnGamma(2);
0
> LnGamma(-100.5);
-364.9010
> LnGamma(15000);
129233.1932
> LnGamma(100,100);
358.4141
See also: ?factorial ?Gamma ?Lngamma
Lngamma
Function Lngamma - logarithm of the complement of the Gamma function
Calling Sequence: Lngamma(a,x)
Parameters:
Name Type Description
-----------------------------------------------------------------------------
a positive a numerical value
x nonnegative a nonnegative argument for the Incomplete Gamma function
Returns:
numeric
Synopsis: Lngamma is the logarithm of the complement with respect to Gamma(a)
of the Incomplete Gamma function:
Lngamma(a,x) = ln( Gamma(a) - Gamma(a,x) )
x
/
| (a - 1)
Lngamma(a, x) = ln( | t exp(-t) dt)
|
/
0
References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun,
Ch 6.5.2
Examples:
> Lngamma(2,3);
-0.2221
> ln( Gamma(2) - Gamma(2,3) );
-0.2221
See also: ?factorial ?Gamma ?LnGamma
LoadMatrixFile
Function LoadMatrixFile - Loads a substitution rate matrix and character
frequencies from a file.
Calling Sequence: LoadMatrixFile(f)
Parameters:
Name Type Description
----------------------------
f string path to file
Returns:
Q : freq
Synopsis: The function LoadMatrixFile reads a matrix file in PAML compatible
format. It computes the substitution rate matrix and returns it, together
with the character frequency vector.
It is assumed that the order of amino acids and codons is always the same and
the matrix is re-ordered to correspond to the order used by Darwin.
Examples:
> LoadMatrixFile('matrices/wag.dat');
See also:
LocalNucPepAlign
Function LocalNucPepAlign
Calling Sequence: LocalNucPepAlign(npm,D)
Parameters:
Name Type
------------------
npm NucPepMatch
D DayMatrix
Returns:
NucPepMatch
Synopsis: Return the NucPepMatch between the nucleotide and the peptide of
npm with the highest score.
Examples:
See Also:
?AlignNucPepAll ?GetPeptides ?VisualizeGene
?FindNucPepPam ?GlobalNucPepAlign ?VisualizeProtein
?Gene ?LocalNucPepAlignBestPam
?GetIntrons ?NucPepMatch
LocalNucPepAlignBestPam
Function LocalNucPepAlignBestPam
Calling Sequence: LocalNucPepAlignBestPam(m)
Parameters:
Name Type
------------------
m NucPepMatch
Returns:
NucPepMatch
Synopsis: Apply LocalNucPepAlign and FindNucPepPam until a maximum is found.
Examples:
See Also:
?AlignNucPepAll ?GetPeptides ?VisualizeGene
?FindNucPepPam ?GlobalNucPepAlign ?VisualizeProtein
?Gene ?LocalNucPepAlign
?GetIntrons ?NucPepMatch
LockFile
Function LockFile - createas a exclusive lock file
Option: builtin
Calling Sequence: LockFile(filename,message)
Parameters:
Name Type Description
----------------------------------------------------------------
filename string the name of the exclusive lock file
message string (optional) a comment, added to the lock file
Returns:
boolean
Synopsis: This command creates a file with the given name which contains some
information about the process and, optionally, the given message. The
creation of this file is done in such a way that only one process will
succeed in creating such a file, even if various are competing for the same
filename. This implements an exclusive lock mechanism or a semaphore. The
command returns true when it was successful in securing the lock and false
otherwise. The filename will contain a single line with the hostname,
process id number, date and any given message. It is guaranteed that only
one process will be successful with a given lock file. This will work on
file systems which implement the exclusive locking mechanims provided by
fcntl (see "man 2 fcntl" in unix/linux).
Examples:
See Also:
?FileStat ?OpenWriting ?ReadRawLine ?SplitLines
?inputoutput ?ReadData ?ReadURL
?OpenAppending ?ReadLine ?SearchDelim
?OpenReading ?ReadRawFile ?ServerSocket
LongInteger
Function LongInteger( s )
Data structure LongInteger( ... )
Representation of integers which could exceed the 53 bits of precision
available with IEEE double precision floating point numbers. Operations with
LongIntegers are contagious, that is to say that any arithmetic operation
with at least one LongInteger argument will return a LongInteger result.
This implementation is OO and any program/function working correctly for
integers, should work correctly when the input contains LongIntegers (with
the obvious differences accounted for additional precision).
- Operations:
Initialization: a := LongInteger( )
a := LongInteger( )
a := LongInteger( , , ... )
The first case transform the integer argument to the long precision
format. The second format accepts a string which should contain an
integer (possibly signed) of arbitrary length. The third case is
to build a long precision integer when its representation base
LongInteger_base is known.
LongIntegers are represented by a LongInteger structure having the
following properties:
a := LongInteger( i1, i2, i3, .... , i[k] );
value: i1 + i2*LongInteger_base + i3*LongInteger_base^2 + ...
assertions: -LongInteger_base/2 <= i[j] <= LongInteger_base/2
i[k] <> 0 (except for the representation of 0)
Arithmetic operations:
a+b, a-b, a*b, iquo(a,b), a^b, mod(a,b), |a|
(powering is only supported for positive exponents)
Boolean operations:
a = b, a <= b, a < b
Special functions Rand(LongInteger)
Printing: print(a);
printf( '%d', a );
Type testing: type(a,LongInteger);
- Conversions:
To string : string(a)
numeric : numeric(a)
- Selectors:
no selectors
See also, ?Inherit ?integer ?LLL
MAlign
Function MAlign - multiple sequence alignment using various methods
Calling Sequence: MAlign(seqs,method,labels,tree,allall)
Parameters:
Name Type Description
------------------------------------------------------------------------------
seqs list(string) sequences to align
method string (optional) method(s) to compute the alignment
labels list(string) (optional) labels for the sequences
tree Tree (optional) Tree used by the prob method
allall matrix({0,Alignment}) (optional) all-against-all Alignments
Returns:
MAlignment
Global Variables: MSA_CircularTour
Synopsis: MAlign does a multiple sequence alignment (MSA) using the given
method(s). The valid methods are:
prob Probabilistic method to build MSA
circ Circular tour method to build MSA
best Chooses the best of 4 methods (expensive)
Global Global alignments between sequences
Local Local alignments between sequences
CFE Cost Free End alignments between sequences
GapHeuristic Use gap heuristics to improve the result
If a method is not specified, the probabilistic method will be used. The
GapHeuristic can be specified in addition to the other method specification.
If a tree is not provided, it will be calculated (for the Probabilistic method
which needs a tree).
If an all-against-all Alignment array is not provided, one will be calculated.
With the method best, 4 different multiple sequence alignments will be
computed (circular, and probabilistic with Local, CFE and Global) and the
best scoring one will be returned. GapHeuristics are used for the 4
methods. This is naturally 4 times more expensive than a single alignment
and should be used with care.
Examples:
> msa := MAlign(['ASDFAA','ASDAV','ASFDAA']):;
dimensionless fitting index 73.14
> print(msa);
Multiple sequence alignment:
----------------------------
Score of the alignment: 14.782993
Maximum possible score: 23.171372
Sequence 1 _ASDFAA
Sequence 2 _ASDAV_
Sequence 3 ASFDAA_
> msa := MAlign(['ASDFAA','ASDAV','ASFDAA'], 'circ'):;
> print(msa);
Multiple sequence alignment:
----------------------------
Score of the alignment: 1.8851224
Maximum possible score: 15.639881
Sequence 1 ASDFAA
Sequence 2 ASD_AV
Sequence 3 ASFDAA
See Also:
?Align ?Clusters ?DynProgStrings
?Alignment ?DynProgScore ?MAlignment
MAlignment
Class MAlignment - a protein or DNA multiple sequence alignment
Template: MAlignment(InputSeqs,AlignedSeqs,labels,method,PrintOrder,Score,
UpperBound,tree,AllAll)
Fields:
Name Type Description
-----------------------------------------------------------------------
InputSeqs list(string) input sequences (before alignment)
AlignedSeqs list(string) aligned sequences (in input order)
labels list(string) labels for the sequences (in input order)
method string method(s) that generated the MSA
PrintOrder list(integer) order used for printing and scoring
Score numeric score of the MSA (circular tour)
UpperBound numeric upper bound score (circular tour)
tree Tree tree used by the probabilistic method
AllAll matrix all against all Alignment matrix
Methods: MAlignment_type PartialOrderMSA print Rand select string
Synopsis: An MAlignment stores the information of a multiple sequence
alignment. The sequences may contain proteins or DNA. The Score and
UpperBound (on the score) are calculated using the circular tour method. In
order to force recalculation of the score, use the selector RecalcScore. A
MAlignment is normally created by calling MAlign.
See also: ?Align ?Alignment ?MAlign
MLTopoTest
Function MLTopoTest - Run KH test on an prespecified tree and ML tree over a
MAlignment and return whether the null hypothesis is
rejected or not.
Calling Sequence: MLTopoTest(msa,t1)
Parameters:
Name Type Description
------------------------------------------------------------------------
msa MAlignment Multiple sequence alignment
t1 Tree Input tree
subst string (optional) Substitution model for PhyML (LG)
nrOfBootraps posint (optional) Number of bootstraps (100)
sigLevel numeric (optional) Significance level
Returns:
boolean
Synopsis: Run KH test on an apriori tree and the ML tree over a MAlignment
and return whether the null hypothesis is rejected or not. MLTopoTest
returns true if null hypothesis is rejected, false otherwise. PhyML is
employed to do likelihood maximization and must be installed in order to use
this function.
References: Goldman N., Anderson J.P., Rodrigo A.G. Likelihood-Based Tests of
Topologies in Phylogenetics, Systematic Biology, 49:652-670, 2000
Examples:
> ReadProgram('datasets/quartet1/trees.drw');;
> msa := ReadFastaIntoMAlignment('datasets/quartet1/MSA_1.fa');;
> lprint('ML KH test', MLTest(msa,tree1));;
MSAMethod
Data structure MSAMethod( )
Function: creates a datastructure for MSA construction
Selectors:
Method: String
"PROB", "CLUSTAL", "MSA", "REPEATED" or any combination with
"GAP", e.g. "PROB GAP"
Default: "PROB GAP"
Gap: GapHeuristics()
If GAP is used in Method, the GapHeuristics data structure is used
MSAStatistics
Data structure MSAStatistics( )
Data structure that keeps statistical data about MSA constructions and methods
Selectors:
Type: Tree
Information on the Tree that was used
Construction: TreeConstruction
Information about the TreeConstruction type that was used
Method: MSAMethod
Type of MSA Method that was used
Real: Integer
Number of best msa constructions
Total: Integer
Total number of msas construced
Score: Stat()
Average Score of msa
Deltascore: Stat()
Difference of real score minus calculated score
Name: string
Name/Title of these statistics
MST
Function MST - Minimum-Spanning Tree algorithm
Calling Sequence: MST(A)
Parameters:
Name Type Description
--------------------------
A Graph a Graph
Returns:
Graph
Synopsis: The input to this algorithm is an undirected graph. It computes the
minimum spanning tree according to Prim's algorithm. The implementation has
a time complexity of O(|V|^2*log(|V|)), whereas the theoretical minimum is
O(|E|). Therefore, this implementation is relatively good when working with
dense graphs, in which case |E| is O(|V^2|).
Examples:
> hex := HexahedronGraph();
hex := Graph(Edges(Edge(0,1,2),Edge(0,1,4),Edge(0,1,5),Edge(0,2,3),Edge(0,2,6),Edge(0,3,4),Edge(0,3,7),Edge(0,4,8),Edge(0,5,6),Edge(0,5,8),Edge(0,6,7),Edge(0,7,8)),Nodes(1,2,3,4,5,6,7,8))
> MST(hex);
Graph(Edges(Edge(0,1,2),Edge(0,2,3),Edge(0,1,4),Edge(0,1,5),Edge(0,2,6),Edge(0,3,7),Edge(0,4,8)),Nodes(1,2,3,4,5,6,7,8))
See Also:
?BipartiteGraph ?Graph_minus ?ParseDimacsGraph
?Clique ?Graph_Rand ?Path
?DrawGraph ?Graph_XGMML ?RegularGraph
?Edge ?InduceGraph ?ShortestPath
?EdgeComplement ?MaxCut ?TetrahedronGraph
?Edges ?MaxEdgeWeightClique ?VertexCover
?FindConnectedComponents ?MinCut
?Graph ?Nodes
Machine
Class Machine - structure to hold Machine references
Template: Machine(Name,User,Class,Processes,MaxProcesses,LoginControl,
OffHours,LoadRange,ForcedRun,NiceValue,StartCycle,DownCount,
LastProcess)
Fields:
Name Type
-------------------------------
Name string
User string
Class integer
Processes list(Process)
MaxProcesses posint
LoginControl boolean
OffHours integer..integer
LoadRange numeric..numeric
ForcedRun boolean
NiceValue integer
StartCycle numeric
DownCount integer
LastProcess integer
Returns:
Machine
Methods: Machine_type select
Synopsis: This data structure holds information about a particular machine
(computer). The main application is for parallel processing and hence it
contains all sorts of controlling information.
See also: ?darwinipc ?ParExec2 ?Process
MafftMSA
Function MafftMSA - Multiple sequence alignment using Mafft
Calling Sequence: MafftMSA(seqs,labels,dm)
Parameters:
Name Type Description
--------------------------------------------------------------------
seqs list(string) sequences to align
labels list(string) (optional) labels for the sequences
dm DayMatrix (optional) Dayhoff matrix used for alignment
Returns:
MAlignment
Synopsis: MafftMSA computes a multiple sequence alignment (MSA). If no
Dayhoff matix is passed, mafft uses the BLOSUM62 scoring matrix. Since mafft
does not return a score of the MSA, the score and upperbound score in the
MAlignment data structure is left undefined. The function works only in
unix/linux, and assumes that Mafft is available. Information and source of
mafft is available from 'http://align.bmr.kyushu-u.ac.jp/mafft/software/'.
Examples:
> msa := MafftMSA(['ASDFAARA','ASDAVRA','ASFDAATA']);
> print(msa);
Multiple sequence alignment:
----------------------------
Score of the alignment: 0
Maximum possible score: 1.7976931e+308
1 ASDFAARA
2 AS_DAVRA
3 ASFDAATA
See also: ?Align ?Alignment ?MAlign ?MAlignment
MapleFormula
Class MapleFormula - mathematical formula given in Maple format
Template: MapleFormula(string)
Fields:
Name Type
-------------------------------------
string math formula in maple format
Returns:
MapleFormula
Methods: HTMLC MapleFormula_type Rand string
Synopsis: A MapleFormula object is constructed with a single argument, the
formula that is to be sent to Maple for "nice" text output formatting.
Examples:
> M := MapleFormula('sum(i,i=1..10)'):
> print(M);
10
-----
\
) i
/
-----
i = 1
See Also:
?Block ?HTML ?Paragraph ?Table
?Code ?HyperLink ?PostscriptFigure ?TT
?Color ?Indent ?print ?View
?Copyright ?LastUpdatedBy ?Roman
?DocEl ?latex ?RunDarwinSession
?Document ?List ?screenwidth
Match
Class Match - Structure data type to hold peptide/peptide matches
Template: Match(Offset1,Offset2)
Match(Sim,Offset1,Offset2)
Match(Sim,Offset1,Offset2,Length1,Length2)
Match(Sim,Offset1,Offset2,Length1,Length2,pam)
Match(Sim,Offset1,Offset2,Length1,Length2,PamNumber,PamVariance)
Fields:
Name Type Description
--------------------------------------------------------------------------
Sim numeric similarity score of the Match
Offset1 posint offset of the first sequence in the database
Offset2 posint offset of the second sequence in the database
Length1 posint length of the match of the first sequence
Length2 posint length of the match of the second sequence
PamNumber numeric Estimate of the PAM distance between the sequences
PamVariance numeric Estimate of the PAM variance between the sequences
Returns:
Match
Methods: AC Alignment Entry ID Match_type print Sequence
Synopsis: The Match structure holds all the necessary information for the
alignment of two peptide sequences. The offsets are positions into a
peptide database, hence Match requires that an appropriate database has been
loaded. The offsets are relative to the system variable DB. Typically,
Match structures are initialized by giving only the two offsets. The
remaining fields are completed by one of several alignment algorithms.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
> m:=Match( Sequence(Entry(1)), Sequence(Entry(2)) );
m := Match(376,1836)
> m2 := Match( GetOffset('UTTUWPC'), Sequence(Entry(20)));
m2 := Match(377757968,19068)
See also: ?GetOffset ?MAlign ?NucPepMatch ?ReadDb ?TotalAlign
MatchRegex
Function MatchRegex - matches a regex in a string
Option: builtin
Calling Sequence: MatchRegex(pat,txt)
Parameters:
Name Type Description
---------------------------------------------
pat string a regex pattern to be matched
txt string a text which is searched
Returns:
list(string)
Synopsis: This function matches a regex pattern string in the POSIX Extended
Regular Expression syntax in a query string. The matching is case sensitive.
If the pattern cannot be matched, the empty list is returned.
Examples:
> MatchRegex('^a(b|c*)e', 'accceb');
[accce, ccc]
> MatchRegex('([a-c]*)de(a.*)', 'xacccdeabbb');
[acccdeabbb, accc, abbb]
> MatchRegex('[A-D]a', 'acccda');
[]
See Also:
?BestSearchString ?SearchApproxString ?SearchString
?CaseSearchString ?SearchDelim
?HammingSearchString ?SearchMultipleString
Matrices
Function Matrices
Calling Sequence: Matrices()
Returns:
NULL
Synopsis: This function loads various peptide scoring matrices including the
Gonnet/Benner PAM matrices, Blosum{50,60,62,70}, UNITARY, UNITARY2, RDDH250
(see `Amino Acid Substitutions in Structurally Related Proteins', JMB (1988)
204, 1019-1029. by Risler, Delorme, Delacroix and Henaut.), PIMA.
See also: ?CreateDayMatrices ?CreateDayMatrix ?DayMatrix
MaxCut
Function MaxCut - Approximate max-cut algorithm
Calling Sequence: MaxCut(G)
MaxCut(G,weighted)
Parameters:
Name Type Description
-------------------------------------------------------
G Graph a Graph
weighted boolean (optional) compute weighted maxcut
Returns:
list : [set,set,numeric]
Synopsis: MaxCut is the problem of computing the maximum cut of a undirected
graph G(V,E), i.e., that of partitioning the vertex set V into two parts so
that the number (resp. weights) of edges joining vertices in different parts
is as large as possible. It is known to be NP-hard.
This greedy approximation algorithm solves the unweighted MaxCut problem in
O(e+n) (weighted O(e*log(e)+n)) and is a 1/2+1/(2n) approximation. The
weighted form of the algorithm expects numeric Label fields in the graph
data-structure.
The algortihm returns the two disjoint vertex sets and the number (resp.
weights) of the edges crossing the two sets.
Examples:
> G := Rand(Graph):
> MaxCut(G);
[{2,5,7,9,10}, {1,3,4,6,8}, 15]
See Also:
?BipartiteGraph ?Graph_minus ?ParseDimacsGraph
?Clique ?Graph_Rand ?Path
?DrawGraph ?Graph_XGMML ?RegularGraph
?Edge ?InduceGraph ?ShortestPath
?EdgeComplement ?MaxEdgeWeightClique ?TetrahedronGraph
?Edges ?MinCut ?VertexCover
?FindConnectedComponents ?MST
?Graph ?Nodes
MaxEdgeWeightClique
Function MaxEdgeWeightClique - Maximum edge-weight clique approximate
algorithm
Option: builtin
Calling Sequence: MaxEdgeWeightClique(A)
Parameters:
Name Type Description
-------------------------------------------------
A Graph a Graph with positive edge weights
Returns:
set
Synopsis: The input to this algorithm is an undirected graph. An undirected
graph is represented as a Graph data structure which should accept three
selectors: Nodes, Edges and Weight. An approximation algorithm is used to
find the best Clique. The global variable CliqueIterFactor may be assigned
a non-negative number f. The larger f, the more accurate the answers will
be, and the more time the algorithm will consume. The default behaviour is
identical to setting CliqueIterFactor to 1. The current version does part
of the searching for the best solution in a random way, so for large
problems, different runs may give different results. This allows the
algorithm to be run in parallel if necessary. For convenience, the global
variable TotalEdgeWeight is assigned the sum of edge-weights of the clique
found.
See Also:
?BipartiteGraph ?Graph_minus ?ParseDimacsGraph
?Clique ?Graph_Rand ?Path
?DrawGraph ?Graph_XGMML ?RegularGraph
?Edge ?InduceGraph ?ShortestPath
?EdgeComplement ?MaxCut ?TetrahedronGraph
?Edges ?MinCut ?VertexCover
?FindConnectedComponents ?MST
?Graph ?Nodes
MaxLikelihoodSize
Function MaxLikelihoodSize
Calling Sequence: MaxLikelihoodSize(m,k)
Parameters:
Name Type
-------------------------------------------
m posint, the number of balls
k posint, the number of occupied boxes
Returns:
posint
Synopsis: MaxLikelihoodSize determines the number of boxes from a "balls in
boxes" experiment by maximum likelihood. When m balls are randomly
distributed and they occupy k boxes this function returns the most likely
total number of boxes. An application of this is the estimation of how many
local minima a function may have, when m random searches find k different
minima. If m=k, the determination is not possible (result is infinity) and
the function returns DBL_MAX. The probability of m balls randomly
distributed among n boxes using k of them is:
Pr{n,m,k} = stirling2(m,k) * n(n-1)...(n-k+1) / n^m
Examples:
> MaxLikelihoodSize(10,9);
42
> MaxLikelihoodSize(10,10);
1.7976931348623147e+308
> MaxLikelihoodSize(120,100);
316
See also: ?ProbBallsBoxes ?ProbCloseMatches
MaximizeFunc
Function MaximizeFunc
Calling Sequence: MaximizeFunc(f,r,tol)
MaximizeFunc(f,r)
Parameters:
Name Type
-----------------------
f procedure
r numeric..numeric
tol numeric >= 0
Returns:
[x, f(x)]
Synopsis: This function finds the maximum [x, f(x)] of a convex function f
over a range r within an absolute accuracy of tol. If tol is not given as a
parameter, the result is of machine accuracy.
Examples:
> MaximizeFunc(x -> sin(x), 0..1);
[1.0000, 0.8415]
> MaximizeFunc(x -> x^2 - 3*x, -2..0);
[-2.0000, 10.0000]
See Also:
?BFGSMinimize ?MaxLikelihoodSize ?MinimizeBrent ?MinimizeSD
?DisconMinimize ?Minimize2DFunc ?MinimizeFunc ?NBody
MaximizeRD
Function MaximizeRD
Calling Sequence: MaximizeRD(ini,f,ran,MaxHours)
Parameters:
Name Type Description
-------------------------------------------------------------------
ini anything Initial solution
f procedure Function to be optimized
ran procedure Procedure that returns a new direction
MaxHours positive Optional, limit of computation time in hour.
Returns:
point:type(ini)
Synopsis: This function finds the point (in a potentially high dimensional
space) that maximizes f using random directions.
The input "ini" can be of any type that accepts linear operations (type(ini)
could be numerical, list(numerical),matrix(numerical) or anything which
accepts addition of similar objects and multiplication by numerical
constants).
The function f takes a single argument of type(ini) and returns a numerical
value. f(ini) is the initial value. The function f does not need to be
continuous. It is common to have f returning -DBL_MAX when the argument
is out of the valid range.
The procedure ran returns an object of type(ini) and provides a random
direction provides a random direction in the space of the arguments. ran
is called with an argument which is the most recent optimal point. This
is useful when the generation of the random direction requires
information about the point. Let
d := ran( pt );
Then,
f( pt + h*d )
are the points that will be explored, that is starting from pt following the
direction d. It is clear that pt + h*d has to be computable or in other
words (h is numeric) that type(ini) is an object which accepts linear
operations. (type(ini) could be numerical, list(numerical), matrix
(numerical) or anything which accepts addition of similar objects and
multiplication by numerical constants).
Examples:
> MaximizeFunc(x -> sin(x), 0..1);
[1.0000, 0.8415]
> MaximizeFunc(x -> x^2 - 3*x, -2..0);
[-2.0000, 10.0000]
See Also:
?BFGSMinimize ?MaxLikelihoodSize ?MinimizeFunc
?DisconMinimize ?Minimize2DFunc ?MinimizeSD
?MaximizeFunc ?MinimizeBrent ?NBody
MinCut
Function MinCut - Approximate min-cut algorithm
Calling Sequence: MinCut(G)
MinCut(G,errbound)
Parameters:
Name Type Description
-----------------------------------------------------------------------
G Graph a Graph
errbound nonnegative (optional) error bound for not finding minimum
Returns:
list : [integer, Nodes, Nodes]
Synopsis: MinCut is the problem of computing the minimal cut of a undirected
graph G(V,E), i.e., that of partitioning the vertex set V into two parts so
that the number of edges joining vertices in different parts is minimal.
This randomized algorithm solves computes a MinCut in O(n^2*log^3(n)). The
optional argument 'errbound' is used to set the number of trial runs and has
been empirically found to be very conservative.
The algorithm returns the number of edges which cross the cut and the two
disjoint vertex sets.
Examples:
> G := Graph({{1,2},{2,3},{1,3},{1,4}},{1,2,3,4});
G := Graph(Edges(Edge(0,1,2),Edge(0,1,3),Edge(0,1,4),Edge(0,2,3)),Nodes(1,2,3,4))
> MinCut(G);
[1, Nodes(1,2,3), Nodes(4)]
See Also:
?BipartiteGraph ?Graph_minus ?ParseDimacsGraph
?Clique ?Graph_Rand ?Path
?DrawGraph ?Graph_XGMML ?RegularGraph
?Edge ?InduceGraph ?ShortestPath
?EdgeComplement ?MaxCut ?TetrahedronGraph
?Edges ?MaxEdgeWeightClique ?VertexCover
?FindConnectedComponents ?MST
?Graph ?Nodes
MinSquareTree
Function LeastSquaresTree - compute a distance phylogenetic tree using least
squares
Option: builtin
Calling Sequence: LeastSquaresTree(Dist,Var)
LeastSquaresTree(Dist,Var,Labels)
LeastSquaresTree(Dist,Var,Labels,IniTree,Keep)
Parameters:
Name Type Description
---------------------------------------------------------------------------------
Dist matrix(numeric) Pairwise distances
Var matrix(numeric) Variances
Labels list Optional labels for the leaves
IniTree Tree Initial tree to optimize its branch lengths
IniTree 'Random' To start with a completely random tree
IniTree 'NJRandom' To start with a random Neighbour-joining like tree
IniTree 'Trials' = posint Run n trials with NJRandom and return the best tree
Keep 'KeepTopology' (optional) Optimize branch lengths only
Returns:
Tree
Synopsis: This function computes a binary tree which approximates the given
distances Dist by least squares. The distances are assumed to have a
variance given by the matrix Var. If a list Labels is given, the leaf of
the resulting trees are labelled with these values. The Leaf nodes produced
have 3 fields: (1) the label given (or their integer index if no Labels are
given), (2) the height of the Leaf and (3) their integer index. If the
global variable MinLen is assigned a positive value, it will determine the
minimum branch length. If not set, 1/1000th of the average distance between
leaves is used. The quality of the fit is measured by the sum of the
squares of the weighted deviations divided by (n-2)(n-3)/2. This value is
stored in the global variable MST_Qual. A dimensionless fitting index is
also computed, it is the MST_Qual / variance(Dist) * harmonic_mean(Var).
This value is printed and stored in the global variable DimensionlessFit.
Trees built over the same set of species, even with radically different
methods, can be ranked by the quality of their fit with this index. If the
fourth parameter has a Tree, then this tree is taken and optimized.
If the fourth argument is the word "Random", then the optimization is started
over a random tree. For large trees it makes sense to try several random
trees and choose the one with the best MST_Qual. When starting with random
trees, the global variable MST_Prob can be set to any numerical value
between 0 and 1. Values close to 1 select trees which are very close to the
one given by Neighbour Joining. Values close to 0 select completely random
trees. Leaving MST_Prob unassigned is equivalent to using NJRandom.
When "NJRandom" is used, a Neighbour-joining like tree is make with a variable
level of randomness at each step which may produce better random trees.
When the word KeepTopology is used, the optimization is done only on the
branch lengths. This is useful to optimize the branches of a given tree.
The function Tree_matrix extracts the distance matrix from a tree. It is sort
of the inverse of LeastSquaresTree.
Examples:
> D := [[0, 3, 13, 10], [3, 0, 14, 11], [13, 14, 0, 9], [10, 11, 9, 0]];
D := [[0, 3, 13, 10], [3, 0, 14, 11], [13, 14, 0, 9], [10, 11, 9, 0]]
> V := [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]];
V := [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]
> LeastSquaresTree(D, V);
dimensionless fitting index 0
> t := LeastSquaresTree(D, V, [AA, BB, CC, DD]);
dimensionless fitting index 0
> print(Tree_matrix(t));
0 3 13 10
3 0 14 11
13 14 0 9
10 11 9 0
See Also:
?BootstrapTree ?Leaf ?Synteny
?ComputeDimensionlessFit ?PhylogeneticTree ?Tree
?DrawTree ?RBFS_Tree ?Tree_matrix
?GapTree ?SignedSynteny ?ViewPlot
Minimize2DFunc
Function Minimize2DFunc
Calling Sequence: Minimize2DFunc(f,x,y,prevpoints)
Parameters:
Name Type
-------------------------------------------------------------------
f a function with two arguments to be minimized
x an optional initial value for the first argument to f
y an optional initial value for the second argument to f
prevpoints an optional list of triplets, [x,y,f(x,y)]
Returns:
[numeric, numeric, numeric] : [x,y,f(x,y)], a triplet, where x,y is a local minimum of f
Synopsis: Minimize2DFunc minimizes the function f in two variables. If x,y
are given, the minimization starts at the point x,y. If additional points
are known, they can be included in the list prevpoints and they will not be
recomputed. To avoid a point (which, for example, has a singularity) the
point should be included in the prevpoints list with a very high (faked)
value. If no points are given, Minimize2DFunc starts at random values U(-1,
1) of x and y. Minimize2DFunc assumes that f(x,y) is very expensive to
compute and tries to do a minimum number of evaluations.
Examples:
> Minimize2DFunc((x,y) -> sin(x*y)+cos(y));
[0.5000, -3.1416, -2]
> Minimize2DFunc((x,y) -> x^4+3*y^2-x*y,1,2);
[0.2041, 0.03402069, -0.00173611]
See Also:
?BFGSMinimize ?MaximizeFunc ?MinimizeBrent ?MinimizeSD
?DisconMinimize ?MaxLikelihoodSize ?MinimizeFunc ?NBody
MinimizeBrent
Function MinimizeBrent - Univariate minimization using Brent's algorithm
Calling Sequence: MinimizeBrent(f,iniguess,incr,relateps)
Parameters:
Name Type
--------------------------------------------------------
f a function of one argument to be minimized
iniguess an initial value for the argument of f
incr an initial increment to probe around iniguess
relateps a relative error goal in the argument of f
Returns:
[numeric, numeric] : [x,f(x)], a pair, where x is a local minimum of f
Synopsis: MinimizeBrent minimizes the function f(x) in one variable. The
minimization starts at the point "iniguess" and probes around with initial
increment "incr". By giving a small increment, one can have some assurance
that a local minimum close to the initial guess will be found. Additional
arguments to MinimizeBrent are passed to the function f(x), so that f(x) can
be written without using global variables. If the function cannot achieve
the accuracy requested in 200 iterations it will stop. The algorithm uses a
technique based on 3 points spaced by the golden ratio which was introduced
by Richard Brent.
Examples:
> MinimizeBrent( cos, 3, 0.01, 1e-7 );
[3.1416, -1.0000]
See Also:
?BFGSMinimize ?MaximizeFunc ?Minimize2DFunc ?MinimizeSD
?DisconMinimize ?MaxLikelihoodSize ?MinimizeFunc ?NBody
MinimizeFunc
Function MinimizeFunc - Minimize a multivariate function using hill descending
Calling Sequence: MinimizeFunc(f,iniguess,epsini,epsfinal)
Parameters:
Name Type
-------------------------
f procedure
iniguess array(numeric)
epsini numeric
epsfinal numeric
Returns:
[x, f(x)]
Global Variables: Minimize_args
Synopsis: Starting at iniguess with error tolerance epsini, this function
minimizes f until the accuracy in each dimension is less than or equal to
epsfinal. The function f takes an array of arguments. It returns the
argument and the value of the local minimum found. The dimension of the
array is given by the dimension of iniguess.
Examples:
> MinimizeFunc(x -> 3*tan(x[1])+abs(x[2]), [0.33, 0.44], 0.2, 0.1);
[[-1.5708, -0.01126905], -362492.3547]
See Also:
?BFGSMinimize ?MaximizeFunc ?Minimize2DFunc ?MinimizeSD
?DisconMinimize ?MaxLikelihoodSize ?MinimizeBrent ?NBody
MinimizeSD
Function MinimizeSD - Minimize a multivariate function using steepest descent
Calling Sequence: MinimizeSD(f,iniguess,relateps)
Parameters:
Name Type Description
--------------------------------------------------------------------------
f procedure the function to minimize, returns [f(x),f'(x)]
iniguess array(numeric) initial guess
relateps numeric stop when |f'(x)| < |x| relateps
Returns:
[x, f(x),f'(x)]
Synopsis: Starting at iniguess this function searches a local minimum in the
direction of the steepest descent. The direction of the steepest descent is
used as long as the actual function decrease agrees (up to 90%) with the
predicted (from the gradient) decrease. This guarantees that the minimum
found is the one in the direction of the initial steepest descent. The
convergence of this function is fast when it is searching far from the
minimum and then it becomes slow when it is close to the minimum.
MinimizeSD returns the list [x,f(x),f'(x)] at a local minimum (when the
convergence criteria is met) or when the number of iterations exceeds 200.
The function f(x) should compute the functional and its gradient and these
should be returned as a list of two values: [fx:numeric,f1x:list(numeric)]
Additional arguments passed on to MinimizeSD (fourth, fifth, etc.) are
passed as additional arguments to f(). In this way f() usually does not
need to rely on global information.
Examples:
> f := x -> [sin(x[1])+x[2]^2,[cos(x[1]),2*x[2]]];
f := x -> [sin(x[1])+x[2]^2, [cos(x[1]), 2*x[2]]]
> MinimizeSD(f, [0.33, 0.44], 0.001);
[[-1.5695, 1.2084e-11], -1.0000, [0.00133735, 2.4168e-11]]
See Also:
?BFGSMinimize ?MaximizeFunc ?Minimize2DFunc ?MinimizeFunc
?DisconMinimize ?MaxLikelihoodSize ?MinimizeBrent ?NBody
Multinomial_Rand
Function Multinomial_Rand - Generate random multinomially distributed integers
Calling Sequence: Rand(Multinomial(n,ps))
Multinomial_Rand(n,ps)
Parameters:
Name Type Description
--------------------------------------------
n integer number of experiments
ps list(numeric) probabilities
Returns:
list(integer)
Synopsis: Given k probabilities ps=[p_1,..., p_k], this function returns a
list of k random integers multinomially distributed with averages n*p_i,
variances n*p_i*(1-p_i) and covariances -n*p_i*p_j. The sum of all integers
is n. Multinomial_Rand uses Rand() which can be seeded by either the
function SetRand or SetRandSeed.
References: MB Brown and J Bromberg (1984), The American Statistician
Examples:
> Rand(Multinomial(100,[0.3,0.2,0.5]));
[34, 24, 42]
> Rand(Multinomial(1000,[0.01,0.9,0.09]));
[9, 898, 93]
See Also:
?Beta_Rand ?Exponential_Rand ?Normal_Rand ?StatTest
?Binomial_Rand ?FDist_Rand ?Poisson_Rand ?Std_Score
?ChiSquare_Rand ?GammaDist_Rand ?SetRand ?Student_Rand
?CreateRandSeq ?Geometric_Rand ?SetRandSeed ?Zscore
?Cumulative ?Graph_Rand ?Shuffle
MultipleSubTree
Function MultipleSubTree( MinSquareTree:Tree, MaxPW:array )
generates an array of arrays of SubTrees for all Pam windows in MaxPW
Mutate
Function Mutate - randomly mutate an amino acid sequence
Calling Sequence: Mutate(seq,PAM,DelType)
Parameters:
Name Type Description
--------------------------------------------------------------
seq string original amino acid sequence
PAM numeric desired PAM distance to mutate
DelType {ExpGaps,ZipfGaps} optional, model for making gaps
Returns:
string
Synopsis: This function simulates evolution by performing random mutations in
an amino acid sequence. These random mutations respond to the PAM distance
given. The mutations will respond to a mutation matrix at that distance.
If the third argument is given, the gaps will be inserted, either with an
exponential or zipfian distribution. If no third parameter is given, no
gaps will be inserted. Mutate will use the mutation matrices available in
logPAM1, which are normally set by CreateDayMatrices(). If these matrices
are created for DNA (only A,C,G and T), then the function Mutate will mutate
a DNA sequence.
Examples:
> Mutate(CreateString(40,A),100);
AKAASVAAFGGTNRAGSAAHASEAARGFNTAAPPTAPADE
See Also:
?CreateDayMatrices ?CreateRandPermutation ?CreateRandSeq ?Shuffle
MySql
Function MySql - Wrapper for MySQL client
Calling Sequence: MySql(query)
Parameters:
Name Type Description
----------------------------------------------------------------------------------------------------------
query string The MySQL query to be executed.
setParseColumns {list(posint),set(posint)} (optional) columns to parse
host host=string (optional) URL of the MySQL server
user user=string (optional) the MySQL username to use when connecting
password password=string (optional) the password to use when connecting
port port=integer (optional) the TCP/IP port number to use for the connection
database database=string (optional) the name of the database to use
Returns:
MySqlResult
Synopsis: The MySql function can be used to access any MySQL database. The
passed query in sql format is executed on the (remote) server and the result
is returned to the user.
Optional arguments and their default values:
setParseColumns A list/set of integer to indicate which columns should be
parsed. Unparsed columns appear as strings in the result.
By default, no columns are parsed.
host=string The URL where the MySQL server is running. The default is
'linneus54.inf.ethz.ch'.
user=string The username to be used when connecting to the server.
The default username is 'darwin'.
password=string The password to be used when connecting to the server. By
default, no password is used.
port=string The TCP/IP port number of the server. If no port number
is provided, the default MySQL port is used.
database=string The name of the database to use. The default database is
'vpeikert' if the host is linneus54, otherwise no
database is selected.
Examples:
> MySql('Select genome_5letter, entry_nr, entry_seq from
genome, entry where entry_id IN (44,45) and entry_genome_id=genome_id'):;
MySqlResult([genome_5letter, entry_nr, entry_seq],[[BACSU, 44,
MAKTLSDIKRSLDGNLGKRLTLKANGGRRKTIERSGILAETYPSVFVIQLDQDENSFERVSYSYADILTETVELTFNDDAASSVAF],
[BACSU, 45, MGRRRGVMSDEFKYELAKDLGFYDTVKNGGWGEIRARDAGNMVKRAIEIAEQQMAQNQNNR]])
> MySql('Select * from oma where oma_id=9233', database='oma_sep08');
MySqlResult([oma_id, oma_entry_id],[[9233, 1039323], [9233, 1107833], [9233,
2057091], [9233, 2201433]])
See also: ?MySqlResult ?OpenPipe ?OpenReading ?ReadRawFile
MySqlResult
Class MySqlResult - the result of a MySql function call
Template: MySqlResult(ColumnLabels,Data)
Fields:
Name Type Description
-------------------------------------------------------------
ColumnLabels list(string) the labels of each column
Data matrix the two dimensional data matrix
Methods: MySqlResult_type print Rand select string
Synopsis: A MySqlResult structure stores the result of a MySql query. The
data of a column can be retreived using the column label as the selector on
the MySqlResult structure.
Examples:
> x := MySql('Select * from oma where oma_id=9233', database='oma_sep08'):;
MySqlResult([oma_id, oma_entry_id],[[9233, 1039323], [9233, 1107833], [9233,
2057091], [9233, 2201433]])
> x['Data'];
[[9233, 1039323], [9233, 1107833], [9233, 2057091], [9233, 2201433]]
> x['oma_entry_id'];
[1039323, 1107833, 2057091, 2201433]
See also: ?MySql
NBody
Function NBody
Option: builtin
Calling Sequence: NBody(dist,var,k1)
NBody(dist,var,k1,k2,rho1,rho2,inipos)
Parameters:
Name Type
---------------------------------------------------------------------
dist distance matrix(numeric > 0) (1..n x 1..n)
var distance variance matrix(numeric >= 0) (1..n x 1..n)
k1 posint, initial dimension, k2 <= k1
k2 posint, final dimension
rho1 numeric >= 0, point separation force
rho2 numeric >= 0, point sequencing force
inipos matrix(numeric), initial guesses (1..n x 1..k), k=k1 or k=k2
Returns:
matrix(numeric) : coordinates of points (1..n x 1..k2)
Synopsis: NBody solves the n-body steady state problem for k1 dimensions,
then squeezes the coordinates to k2 dimensions. The problem is defined as
minimizing the sum
( |x[i]-x[j]| - dist[i,j] ) ^ 2 / var[i,j]
In other words, it does a least squares approximation of the distances.
Var[i,j]=0 is an indication that the distance between i,j should not be used
(fitted). When the errors in the fitting are supposed to be relative to the
values of the distances, it is typical to use var = dist. k2, rho1, rho2
and inipos are all optional. If used, they must appear in the given order.
If k2 is not present it is assumed to be equal to k1.
Rho1 is used to guarantee some separation of the points. The function to
minimize is added the value rho1 / (x[i]-x[j]) ^ 4. If not present, rho1
defaults to 0.
Rho2 is used to impose sequencing over the points. The function to
minimize is added the value rho2 * |x[i]-x[i+1]|. This will guarantee, that
if there are many choices, the selected one will have the original sequence
preserved, like a chain. If not present, rho2 defaults to 0.01.
Inipos is an initial guess of the positions of the bodies. It is a matrix
of dimension 1..n x 1..k1 (or 1..n x 1..k2). If not present, the algorithm
starts with random locations (it uses the function Rand()).
The global variable NBodyPotential is set to the minimum value of the cost
potential.
Examples:
> dist := [ [0,1,1], [1,0,1], [1,1,0] ];
dist := [[0, 1, 1], [1, 0, 1], [1, 1, 0]]
> NBody(dist,dist,2,2,0,0);
[[0, 0], [1.0000, 0], [0.5000, 0.8660]]
> NBodyPotential;
1.3194e-18
See Also:
?BFGSMinimize ?MaximizeFunc ?Minimize2DFunc ?MinimizeFunc
?DisconMinimize ?MaxLikelihoodSize ?MinimizeBrent ?MinimizeSD
NSubGene
Function NSubGene
Calling Sequence: NSubGene(g,baseRange)
Parameters:
Name Type
-------------------------------
g Gene data structure
baseRange posint..posint
Returns:
Gene
Synopsis: Returns the modified Gene containing only bases in baseRange.
Examples:
See also: ?Gene ?PSubGene
Normal_Rand
Function Normal_Rand - Generate random normally distributed numbers
Options: builtin and numeric
Calling Sequence: Rand(Normal)
Rand(Normal(m,s2))
Normal_Rand()
Parameters:
Name Type Description
---------------------------------------------------
m numeric expected value of the variable
s2 nonnegative variance of the variable
Returns:
numeric
Synopsis: This function returns a random number normally distributed with
average 0 and variance 1. Normal_Rand uses Rand() which can be seeded by
either the function SetRand or SetRandSeed. An normal variable with average
m and variance s2 is obtained with the expression sqrt(s2)*Rand(Normal) + m
or with Rand(Normal(m,s2)).
References: Handbook of Mathematical functions, Abramowitz and Stegun,
26.1.26 and 26.2
Examples:
> Normal_Rand();
1.5093
> [Rand(Normal),Rand(Normal)];
[-0.9358, 0.5327]
> Rand(Normal(10,0.001));
9.9755
See Also:
?Beta_Rand ?Exponential_Rand ?Multinomial_Rand ?StatTest
?Binomial_Rand ?FDist_Rand ?Poisson_Rand ?Std_Score
?ChiSquare_Rand ?GammaDist_Rand ?SetRand ?Student_Rand
?CreateRandSeq ?Geometric_Rand ?SetRandSeed ?Zscore
?Cumulative ?Graph_Rand ?Shuffle
Normalize
Function Normalize
Calling Sequence: Normalize(m)
Parameters:
Name Type
------------------
m NucPepMatch
Returns:
NucPepMatch
Global Variables: DB
Synopsis: Normalizes a match referencing (the complement of) an NucDB
database entry to refer to a sequence being present in memory.
Examples:
See also: ?Denormalize
NucPepBackDynProg
Function NucPepBackDynProg - Backwards dynamic programming alignment for
peptide and nucleotide sequences
Option: builtin
Calling Sequence: NucPepBackDynProg(nuc,pep,DM,len1,len2,IntronScoring)
Parameters:
Name Type Description
---------------------------------------------------------------
nuc string a nucleotide sequence
pep string a peptide sequence
DM DayMatrix Dayhoff Matrix
len1 integer optional length of the 1st sequence
len2 integer optional length of the 2nd sequence
IntronScoring list optional Intron Scoring list
Returns:
NULL
Synopsis: Compute the similarity and lengths of the best alignment between
nuc and pep using the Dayhoff matrix DM, the optional lengths len1 and len2
and the optional IntronScoring doing backwards dynamic programming. If the
lengths are not given or -1, return the maximum similarity.
Examples:
See Also:
?AlignNucPepAll ?FindNucPepPam ?LocalNucPepAlignBestPam
?AlignNucPepMatch ?GlobalNucPepAlign ?NucPepDynProg
?DynProgNucPepString ?LocalNucPepAlign ?NucPepMatch
NucPepDynProg
Function NucPepDynProg - Compute a Nucleotide Peptide Alignment
Option: builtin
Calling Sequence: NucPepDynProg(nuc,pep,DM,len1,len2,IntronScoring)
Parameters:
Name Type Description
-----------------------------------------------------------
nuc string Nucleotide Sequence
pep string Peptide Sequence
DM DayMatrix Dayhoff Matrix
len1 integer optional length of 1st sequence
len2 integer optional length of 2st sequence
IntronScoring list Intron scoring
Returns:
NULL
Synopsis: Compute the similarity and lengths of the best alignment between
nuc and pep using the Dayhoff matrix DM, the optional lengths len1 and len2
and the optional IntronScoring. If the lengths are not given or -1, return
the maximum similarity.
Examples:
See Also:
?AlignNucPepAll ?FindNucPepPam ?LocalNucPepAlignBestPam
?AlignNucPepMatch ?GlobalNucPepAlign ?NucPepBackDynProg
?DynProgNucPepString ?LocalNucPepAlign ?NucPepMatch
NucPepMatch
Class NucPepMatch
Template: NucPepMatch(NucEntries,PepEntries)
NucPepMatch(NucOffset,PepOffset)
NucPepMatch(Sim,NucOffset,PepOffset,NucLen,PepLen)
NucPepMatch(Sim,NucOffset,PepOffset,NucLen,PepLen,PamNumber)
NucPepMatch(Sim,NucOffset,PepOffset,NucLen,PepLen,PamNumber,
PamVariance)
NucPepMatch(Sim,NucOffset,PepOffset,NucLen,PepLen,PamNumber,
PamVariance,IntronScoring)
NucPepMatch(Sim,NucOffset,PepOffset,NucLen,PepLen,PamNumber,
PamVariance,IntronScoring,NucGaps,PepGaps,Introns)
Fields:
Name Type Description
-----------------------------------------------------------------------------------------
Sim numeric Similarity score
NucOffset integer Offset of the nucleotide sequence in NucDB
PepOffset integer Offset of the peptide sequence in PepDB
NucLength integer Length of the nucleotide sequence
PepLength integer Length of the peptide sequence
PamNumber numeric Estimated PAM distance for the match
PamVariance numeric Estimated PAM variance for the match
IntronScoring {0,string,structure} Function for scoring introns
NucGaps list Gaps in the nucleotide sequence from the alignment
PepGaps list Gaps in the peptide sequence from the alignment
Introns list List of suspected introns
Returns:
NULL
Methods: Entry Gene ID NucPepMatch_type print select
Global Variables: DB
Synopsis: The NucPepMatch structure holds all the necessary information for
the alignment of a peptide and a nucleotide sequence. The offsets are
positions into a peptide and nucleotide database, hence NucPepMatch requires
that appropriate databases have been loaded.
Examples:
See Also:
?AlignNucPepAll ?GetPosition ?NucPepDynProg
?AlignNucPepMatch ?GlobalNucPepAlign ?NucPepRegions
?Denormalize ?Intron ?ParallelAllNucPepMatches
?DynProgNucPepString ?LocalNucPepAlign ?PepDB
?FindNucPepPam ?LocalNucPepAlignBestPam ?ScoreIntron
?Gene ?Match ?VisualizeGene
?GetAllNucPepMatches ?Normalize ?VisualizeProtein
?GetIntrons ?NucDB
?GetPeptides ?NucPepBackDynProg
NucPepRegions
Function NucPepRegions
Option: builtin
Calling Sequence: NucPepRegions(npm)
Parameters:
Name Type Description
------------------------------------------------
npm NucPepMatch a nucleotide peptide Match
Returns:
list
Synopsis: Converts an NucPepMatch into a list of alignment regions. Region
formats are: - [ALIGN, Sim, nucLen, pepLen] - [NUCGAP, Sim, nucLen, 0] -
[PEPGAP, Sim, 0, pepLen] - [INTRON, Sim, nucLen, 0]. After r :=
NucPepRegions (m), the following equations hold: sum (zip ((x->x[2])(r))) =
m[Sim] sum (zip ((x->x[3])(r))) = m[NucLength] sum (zip ((x->x[4])(r))) =
m[PepLength]. If either PepDB or NucDB are not loaded, Sim will be 0 in
ALIGN regions. If no suitable Dayhoff matrix can be found, Sim will be 0 in
ALIGN, NUCGAP and PEPGAP regions.
Examples:
See also: ?NucPepMatch
OpenAppending
Function OpenAppending
Option: builtin
Calling Sequence: OpenAppending(fname)
OpenAppending(terminal)
OpenAppending(previous)
Parameters:
Name Type
--------------------------
fname filename
terminal system variable
previous system variable
Returns:
NULL
Synopsis: If the parameter is the system name "terminal", all subsequent
output generated by Darwin is sent to the standard output. This is
typically the terminal. Otherwise, all subsequent output generated will be
appended to the file "fname". Fname can be a name or an entire path. If no
file named "fname" exists, Darwin creates such a file. The options
"terminal" and "previous" behave the same way as they do for OpenWriting.
Examples:
> OpenAppending('~hallett/bankaccount');
> print('Debit 100000 SFr.');
> OpenWriting(terminal);
>
See Also:
?FileStat ?OpenReading ?ReadLine ?ReadRawLine
?inputoutput ?OpenWriting ?ReadOffsetLine
?LockFile ?ReadData ?ReadRawFile
OpenPipe
Function OpenPipe - execute system command and pipe output to Darwin
Option: builtin
Calling Sequence: OpenPipe(cmd)
Parameters:
Name Type Description
--------------------------------------------------------
cmd string a command for the underlying UNIX system
Returns:
NULL
Synopsis: OpenPipe will execute the command described by the string cmd and
directs its output to be the input for Darwin. This is called opening a
pipe in the Unix terminology. This output is readable with ReadRawLine()
commands (simply as text) or with ReadLine() commands (when the output
is/are valid Darwin commands). When the output is exhausted, the string EOF
will be returned by the read commands and the pipe will be closed.
Examples:
> OpenPipe(date);
> ReadRawLine();
Thu Oct 12 08:01:39 MET DST 2000
> ReadRawLine();
EOF
See Also:
?CallSystem ?OpenAppending ?ReadOffsetLine ?SystemCommand
?FileStat ?OpenReading ?ReadRawFile ?TimedCallSystem
?inputoutput ?OpenWriting ?ReadRawLine
?LockFile ?ReadLine ?SplitLines
OpenReading
Function OpenReading - open a file for future reading
Option: builtin
Calling Sequence: OpenReading(filename)
Parameters:
Name Type
-----------------
filename string
Returns:
NULL
Synopsis: This functions opens the file given as argument for reading. Any
future ReadRawLine or ReadLine commands will read data from the opened file.
When the end of the file is reached, the read commands will return the token
EOF. If the argument is the name 'terminal', then the standard input (stdin
in Unix) is opened for input. A file that is opened this way can contain
Darwin commands or any arbitrary text. If a ReadRawLine() command is used,
then a textual line of the file will be read. If a ReadLine() command is
used, then the line is expected to be a valid Darwin command. If filename
ends in ".gz" or ".Z", then it is assumed to be a compressed file and it is
decompressed before reading.
Examples:
> OpenReading( '/home/darwin/test' );
> t := ReadRawLine();
> OpenReading(terminal);
See Also:
?FileStat ?OpenAppending ?ReadLine ?ReadURL
?inputoutput ?OpenPipe ?ReadOffsetLine ?ServerSocket
?LockFile ?OpenWriting ?ReadRawFile
?MySql ?ReadData ?ReadRawLine
OpenWriting
Function OpenWriting
Option: builtin
Calling Sequence: OpenWriting(fname)
OpenWriting(terminal)
OpenWriting(previous)
Parameters:
Name Type Description
-----------------------------------
fname string filename
terminal symbol system variable
previous symbol system variable
Returns:
NULL
Synopsis: If filename is given as the parameter, OpenWriting will open a file
named filename and send all subsequent output directed towards the standard
output into this file. If filename already exists, it is overwritten. If
"terminal" is specified, all subsequent output is directed back towards the
standard output (typically the monitor). If filename is "previous", then
the current output stream is closed and subsequent output is reverted to the
stream which was active before the previous OpenWriting or OpenAppending.
Examples:
> OpenWriting('~hallett/Book/mainfile');
> print('A quick way to create a lot of work for myself');
> OpenWriting(terminal);
See Also:
?FileStat ?OpenAppending ?ReadOffsetLine
?inputoutput ?OpenReading ?ReadRawFile
?LockFile ?ReadLine ?ReadRawLine
OrthologousGroup
Class OrthologousGroup - information about an orthologous group of sequences
Template: OrthologousGroup(Species,Seqs,AllAll)
Fields:
Name Type Description
--------------------------------------------------------------------------
Species list(string) species of each sequence
Seqs list(string) the amino acid sequence
AllAll matrix({0,Alignment}) All-against-all alignments
Length posint number of sequences in group
Tree Tree phylogenetic distance tree for the group
Methods: OrthologousGroup_type Rand select string
Synopsis: This is the main result of the function Orthologues. It stores the
information about a group (clique) of orthologous sequences belonging to
various species.
See also: ?Orthologues ?PhylogeneticTree ?Species_Entry ?SP_Species
Orthologues
Function Orthologues - find orthologous groups between various species
Calling Sequence: Orthologues(SpeciesList,SampleSeq,...)
Parameters:
Name Type Description
--------------------------------------------------------------------------
SpeciesList list(string) a list of strings identifying species
SampleSeq string a sequence, find all homologous
MinScore MinScore = positive minimum score for determining homology
ScoreTol ScoreTol = positive score tolerance for stable pairs
LenthTol LenthTol = positive length ratio tolerance for homology
Returns:
list(OrthologousGroup)
Synopsis: Orthologues finds the orthologous groups between a set/list of
species. All the parameters are optional, but one of SpeciesList or
SampleSeq must be provided. An orthologous pair of sequences are homologous
sequences which have diverged because of speciation alone. That is, the
most recent ancestor of the two sequences resided in the most recent common
ancestor of both species. The process follows four steps:
(1) An all-against-all alignment of all sequences in the species (or all
sequences homologous to the given sample) is done. The alignments with a
score above MinScore (default 300) are refined to compute their distance.
The alignment length has to be at least LengthTol (default 70%) of the
length of the shorter sequence.
(2) The stable pairs are found, that is a pair which scores highest among all
pairs in both directions. This maximum score is accepted with a
percentage tolerance given by ScoreTol (default 95%).
(3) The stable pairs are compared against all other species to see if they are
paralogous and not orthologous, the ones which survive the tests are
called verified stable pairs.
(4) Cliques of the verified stable pairs are extracted, one at a time to form
the orthologous groups.
The alignments are done using the Dayhoff matrices stored in DM and DMS
(normally build with CreateDayMatrices). The orthologous groups are
returned in a list of OrtholgousGroup data structures.
Examples:
> Orthologues(['Picea abies', 'Pinus contorta', 'Pinus radiata']);
[OrthologousGroup([Picea abies, Pinus radiata],[CWELYWLEHGIQPDGMMPSDTTVGVGDDAFNTFFSETGAGKHVPRAVFVDLEPTVIDEVRTGAYRQLFHPEQLISGKEDAANNFARGHYTVGEEIVDLCLDRVRKLADNCTGL, MSPKTETKASVGFKAGVKDYRLTYYTPEYQTKDTDILAAFRVTPQPGVPP ..(475).. IKFEFDVIDRL],[[0, Alignment(Sequence(AC('O82035'))[1..356],Sequence(AC('Q40976'))[1..356],4045.0074,DMS[209],4.7818,1.3803,{Local})], [Alignment(Sequence(AC('O82035'))[1..356],Sequence(AC('Q40976'))[1..356],4045.0074,DMS[209],4.7818,1.3803,{Local}), 0]])]
See Also:
?Align ?CreateDayMatrices ?Species_Entry
?Alignment ?OrthologousGroup ?SP_Species
OutsideBounds
Function OutsideBounds - test whether Stats could be the same
Calling Sequence: OutsideBounds(a,b,Confidence = 0.9750)
Parameters:
Name Type Description
------------------------------------------------------------
a Stat Stat data structure to be compared
b Stat second Stat to be compared or
b numeric value to be compared
Confidence positive confidence level (defaults to 0.975)
Returns:
boolean
Synopsis: OutsideBounds checks whether two univariate statistics (Stat
objects) or one univariate statistics and a value represent different values
with a certain confidence level. The confidence level is set to 0.975 by
default, which gives the usual 2.5% error on one side, or the 1.96 standard
deviations away the mean for a normal variable. If the second argument is a
single value the test is equivalent to determining whether the first
distribution could have average b at the given confidence level. The
Confidence level can be changed to any value between 0.5 <= c < 1.
Examples:
> st := Stat('Near 7'):
> to 10 do st+Rand(6.5..7.5) od:
> print(st);
Near 7: number of sample points=10
mean = 6.95 +- 0.20
variance = 0.105 +- 0.058
skewness=0.146037, excess=-1.35184
minimum=6.53935, maximum=7.48143
> OutsideBounds(st,7);
false
> OutsideBounds(st,6.5);
true
> OutsideBounds(st,6.5,Confidence=0.999999);
false
See also: ?ExpFit ?LinearRegression ?Stat ?StatTest ?UpdateStat
PASfromMSA
Function PASfromMSA
Calling Sequence: PASfromMSA(msa)
PASfromMSA(msa,lnM,freq)
Parameters:
Name Type Description
----------------------------------------------------------
msa MAlignment multiple sequence alignment
lnM matrix(numeric) (optional) log. of a 1-PAM matrix
freq array(numeric) (optional) character frequencies
Returns:
ProbSeq
Synopsis: Computes the probabilistic ancestral sequence at the root of a
phylogenetic tree over a multiple sequence alignment of probabilistic
sequences. For protein sequences, the global variable NewLogPAM1 is assumed
to describe the amino acid mutation probabilities. The global variable
LogLikelihoods will be assigned to an array containing the ln of the
likelihoods at each position.
References: GM Cannarozzi, A Schneider and GH Gonnet (2007): Probabilistic
Ancestral Sequences Based on the Markovian Model of Evolution - Algorithms
and Applications, in: D Liberless (editor): Ancestral Sequence
Reconstruction, Oxford University Press.
Examples:
> seqs := ['AAAR','AARR','VTAARRQQ']:
> msa := MAlign(seqs):;
dimensionless fitting index 1470
> print(msa);;
Multiple sequence alignment:
----------------------------
Score of the alignment: 54.882333
Maximum possible score: 54.882333
Sequence 1 _AAAR___
Sequence 2 __AARR__
Sequence 3 VTAARRQQ
> pas := PASfromMSA(msa):
> print(pas);;
pos Most probable chars
1 V 0.83 I 0.05 L 0.04 A 0.03 T 0.02
2 A 0.70 T 0.25 S 0.03 V 0.01 K 0.00
3 A 1.00 S 0.00 V 0.00 G 0.00 T 0.00
4 A 1.00 S 0.00 V 0.00 G 0.00 T 0.00
5 R 1.00 K 0.00 Q 0.00 A 0.00 S 0.00
6 R 0.97 K 0.01 Q 0.00 A 0.00 L 0.00
7 Q 0.70 E 0.06 K 0.05 R 0.03 A 0.03
8 Q 0.70 E 0.06 K 0.05 R 0.03 A 0.03
See Also:
?MAlign ?PASfromTree ?ProbSeq
?MAlignment ?ProbAncestor ?PSDynProg
PASfromTree
Function PASfromTree
Calling Sequence: PASfromTree(seqs,tree)
PASfromTree(seqs,tree,lnM,freq,gapcosts)
Parameters:
Name Type Description
----------------------------------------------------------------------
seqs array({ProbSeq,string}) (probabilistic) sequences
tree Tree tree of the sequences
lnM matrix(numeric) (optional) log. of a 1-PAM matrix
freq array(numeric) (optional) freq. of characters
gapcosts procedure (optional) gap cost function
Synopsis: Computes the probabilistic ancestral sequence at the root of a
phylogenetic tree over a list of probabilistic sequences. For each internal
node, the prob. sequences at the roots of the two subtrees are aligned and
then an ancestral vector is computed. The global variable LogLikelihoods
will be assigned to an array containing the ln of the likelihoods at each
position. The third field of the leaves must be integer numbers
corresponding to the sequences in the list (as it is automatically teh case
when the tree comes either from an MAlign or a PhylogeneticTree call). For
protein sequences, the global variables NewLogPAM1, AF and gap costs drevied
from DMS are assumed. For other types of sequences, the log of a mutation
matrix (e.g. CodonLogPAM1), a vector of natural character frequencies (e.g.
CF) and a function to compute gap costs for a given gap length at a given
PAM distance is needed. (Typically of the form (pam,len)->-37.64+7.434*log10
(pam)-(len-1)*1.3961).
References: GM Cannarozzi, A Schneider and GH Gonnet (2007): Probabilistic
Ancestral Sequences Based on the Markovian Model of Evolution - Algorithms
and Applications, in: D Liberless (editor): Ancestral Sequence
Reconstruction, Oxford University Press.
Examples:
> seqs := ['VAAAR','AARR','VTAARRQQ']:
> ps := [seq(ProbSeq(s,IntToA),s=seqs)]:
> tree := PhylogeneticTree(seqs,[seq(i,i=1..length(seqs))],DISTANCE);
> pas := PASfromTree(ps,tree):
> print(pas);;
pos Most probable chars
1 V 1.00 I 0.00 L 0.00 A 0.00 T 0.00
2 A 0.71 T 0.25 S 0.02 V 0.01 K 0.00
3 A 1.00 S 0.00 V 0.00 G 0.00 T 0.00
4 A 1.00 S 0.00 V 0.00 G 0.00 T 0.00
5 R 1.00 K 0.00 Q 0.00 A 0.00 S 0.00
6 R 0.97 K 0.01 Q 0.00 A 0.00 L 0.00
7 Q 0.76 E 0.05 K 0.04 R 0.03 A 0.02
8 Q 0.76 E 0.05 K 0.04 R 0.03 A 0.02
See Also:
?CreateCodonMatrices ?PASfromMSA ?ProbSeq
?CreateDayMatrices ?ProbAncestor ?PSDynProg
PSDynProg
Function PSDynProg
Calling Sequence: PSDynProg(ps1,ps2,dist,meth)
PSDynProg(ps1,ps2,dist,lnM,freq,gapcosts,meth)
Parameters:
Name Type Description
-----------------------------------------------------------------------------
ps1, ps2 ProbSeq Probabilistic sequences
dist numeric Distance between the two sequences
lnM matrix(numeric) (optional) log. of a 1-PAM matrix
freq array(numeric) (optional) Natural frequencies of the characters
gapcosts procedure (optional) Gapcosts as a function of gap length
meth {Global,Local} (optional) alignment method
Returns:
numeric : ProbSeq
Global Variables: DBGTMP
Synopsis: Dynamic programming over two probabilistic sequences. In the
standard case of proteins, the global varibles NewLogPAM1, AF and gap costs
according to the Dayhoff matrices are used. For other types of sequences
(e.g. DNA or codons), the logarithm of a mutation matrix (e.g. CodonLogPAM1)
and the natural frequencies of the characters (e.g. CF) are required. Also,
a gap cost function is needed that returns the costs for a gap of a given
size. This is usually k->FixedDel+(k-1)*IncDel with the coefficients taken
from the CMS matrix for the given distance. The default alignment method is
'Local'.
References: GM Cannarozzi, A Schneider and GH Gonnet (2007): Probabilistic
Ancestral Sequences Based on the Markovian Model of Evolution - Algorithms
and Applications, in: D Liberless (editor): Ancestral Sequence
Reconstruction, Oxford University Press.
Examples:
> ps1 := ProbSeq('RAAVTGAAAQQQFT',IntToA):
> ps2 := ProbSeq('VTGQQQ',IntToA):
> dist := 10:
> aps := PSDynProg(ps1,ps2,dist):
> print(aps);;
41.6760
pos Most probable chars
1 V 1.00
2 T 1.00
3 G 1.00
4 A 1.00
5 A 1.00
6 A 1.00
7 Q 1.00
8 Q 1.00
9 Q 1.00
pos Most probable chars
1 V 1.00
2 T 1.00
3 G 1.00
4
5
6
7 Q 1.00
8 Q 1.00
9 Q 1.00
See Also:
?CreateCodonMatrices ?PASfromMSA ?ProbAncestor
?CreateDayMatrices ?PASfromTree ?ProbSeq
PSubGene
Function PSubGene
Calling Sequence: PSubGene(g,new,newLength)
Parameters:
Name Type
------------------------------------
PSubGene Gene
new {posint, posint..posint}
newLength posint
Returns:
Gene
Synopsis: Returns the modified Gene encoding the peptide at offset new with
length newLength or with amino acid range new.
Examples:
See also: ?Gene ?NSubGene
PamMax
Function PamMax( MinSquareTree:Tree )
returns the largest pam distance of two sequences in a
MinSquareTree
PamToCodonPam
Function PamToCodonPam - Convert PAM to CodonPAM.
Calling Sequence: PamToCodonPam(lnM1,CF,Pam)
Parameters:
Name Type Description
-----------------------------------------------------------------------
lnM1 matrix(numeric,64) Logarithm of a 1-PAM codon mutation matrix.
CF array(numeric,64) Codon frequencies
Pam numeric PAM distance to be converted
Returns:
numeric
Synopsis: Converts PAM to CodonPAM. This conversion depends on the amount of
synonymous mutations for a species or set of species, so the logarithm of
the 1-CodonPAM matrix and the codon frequencies are required as arguments.
The conversion is done by inverting the CodonPamtoPam function using a
Brent's search.
Examples:
> PamToCodonPam(CodonLogPAM1,CF,50);
109.2499
See also: ?CodonPamToPam ?CreateCodonMatrices
PamToPerIdent
Function PamToPerIdent - Compute percentage identity from PAM
Calling Sequence: PamToPerIdent(p)
Parameters:
Name Type Description
-----------------------------
p numeric PAM distance
Returns:
numeric
Synopsis: Compute the percentage identity that a pam distance will leave.
Examples:
> PamToPerIdent(250);
19.6841
See also: ?PerIdentToPam
PamWindows
Function PamWindows( MinSquareTree:Tree )
returns a vector containing all different PamWindows in a tree
ParExecuteIPC
Function ParExecuteIPC
Calling Sequence: ParExecuteIPC(queue,ProgFileName,machines,handler,delay,
controls)
Parameters:
Name Type Description
--------------------------------------------------------------------------------------------
queue list({string,structure}) statements parameterizing each job
ProgFileName string File name containing init and job procedures
machines list(string) list of machines to be used
handler {0,procedure} result handler
delay posint delay (secs) between checking machines: default 10
controls string statements about how a job be can be executed
Returns:
NULL
Global Variables: Queue StartDate StartTime initCPU istodo killed
logfile mach normal_termination nrCreated nrCycles nrVanished
resultHandler send_mail startable_processes todo
Synopsis: ParExecuteIPC runs the job described in ProgFileName with the
parameters in queue on machines. Before executing a task in parallel on
several machines, several areas must be prepared.
1). Find the machines to be used. The criteria for machines to be used are
that a) they are accessible via the Internet. b) all machines have an account
with the same name. c) All machines must be capable of running darwin and
darwinipc (See ?darwinipc). It is possible to configure ParExecuteIPC to use
certain machines at specific times of the day, only when they have a specific
load or only when no one is logged in. Machine names in this list can have a
suffix of the form ":class" where class is an integer (see example). If class
suffixes are used, a machine with class greater than zero when becoming idle
will start a job already running on a machine of lower class. This avoids
waiting for termination of the last few jobs which are running on slow
machines.
2) Determine what files are needed. All files that are needed must be
available with the same path name on all machines (databases, darwin code,
etc.).
3) Determine the smallest independent job.
4) Determine the variables that parameterize a single job and create a list
(queue) of strings in which each string contains all Darwin statements
required to parameterize a job.
5) Create a file (ProgFileName) containing two parameterless procedures-
init and job. init does the initialization (loads databases, computes Dayhoff
matrices) - its return value is ignored. job does the actual job and must
return the results as a string. Inside job, the global variable PE_job is the
number of the job being executed, and the global variable tmpfile can be used
as the name of a temporary scratch file, for instance to write the results.
The job procedure should be written in such a way that it can be executed
several times within the same run (with different jobs). Note: do not forget
to declare all variables being used in both procedures as global.
6) Optionally, a result handler procedure can be created. The result
handler accepts a job number (an integer) and its result (a string) and
handles the result. Note that handling a job result should only take
negligible time, so this handler typically writes the result to a file. If
you do not provide your own handler, the default handler (indicated by the
number 0 as an argument to ParExecuteIPC) is used. The default result handler
creates one output file per job: the results of job i are stored in
ProgFileName.out.i.
When ParExecuteIPC executes, it automatically creates two files-
ProgFileName.log and ProgFileName.done. In ProgFileName.done, the job numbers
of the completed jobs are listed. If the completion was not successful, then
the job number is preceded by a minus sign. ProgFileName.log is a log of the
process execution. It tells what the status of the machines was when
ParExecuteIPc was started, tells which machine is running each job and the
execution time. It also contains any error messages generated by the
processes. When ParExecuteIPC completes, it sends a mail message containing
some statistics unless the NoMail control statement is passed.
When ParExecuteIPC is killed before completion, it creates a file named
ParExecute.redo. If this file is renamed ParExecAction and the ParExecuteIPC
command is restarted, it will automatically complete all jobs in the
ParExecAction file. When restarted, ParExecuteIPC will also redo any jobs in
the ProgFileName.done file that are preceded by a minus sign. At anytime
during the execution of ParExecuteIPC, control statements can be executed by
placing them in a file called ParExecAction in the directory from which the
command was run.
ParExecuteIPC recognizes the following control statements.
StartUsing m Adds m to the pool of machines being used.
StopUsing m Removes m from the pool of machines to be used. Any job
running on m is killed and its results discarded.
Status Write the status of all machines to the log file.
LoginControl m on/off Turn the login control on machine m on or off.
ForcedRun m on/off On machine m, force process to run (ignore BUSY flag)
NiceValue m n Run at nice n on machine m
OffHours m from..to Jobs running on machine m are stopped between from and to
hours (both in 24 hour notation).
MaxJobs m n Run n jobs (as if having n processors) on machine m
LoadThreshold m low hi A job in machine m is stopped when the load on its
machine is greater than hi, and is continued when the
load gets less than low.
LoadThreshold low hi Set global thresholds
RunAlso job Adds job command to the job queue.
KillAll Kill all running jobs and end ParExecuteIPC. Send a
mail message with execution statistics to the user.
Interrupt Kill all running jobs and end ParExecuteIPC. Write the
jobs to be finished into a file, ParExecAction.redo.
NoMail Turns off the sending of execution statistics by email.
There are four ways to send a control statement to a ParExecuteIPC job:
1) As a line of the optional controls parameter when ParExecuteIPC is
invoked.
As data being sent to the ParExecuteIPC process via Darwin's IPC feature
(see ?darwinipc). This is the most efficient way and response to the command
is immediate. For example, (assuming that the ParExecuteIPC process has pid
8281 and runs on ru3), typing:
ipcsend SEND ru2 8281:Status
at the operating system prompt will cause the ParExecuteIPC status to be
reported on the log file.
3) As a line of the file ParExecAction in the current directory of the
ParExecuteIPC process. Whenever this file is found, it is read, processed and
deleted.
4) As a line of ParExecAction.pid in the current directory of the
ParExecuteAction process, where pid is the process id of the ParExecuteIPC
process. Whenever this file is found, it is read, processed and deleted.
If login control is on, jobs are stopped whenever an interactive user logs
in to the machine, and are continued when no user is logged in. Logins of
certain users can be excluded from this checking with the -u switch of the
darwinipc daemon (see ?darwinipc).
For the following example, first create a file with the name "ParExample".
In that file, define procedures with the names init and job. These procedure
names are not optional. In this example, a result handler is also defined
with procedure name "Handler". This file name is optional. Here is the
contents of the file "ParExample":
init := proc()
ReadDb('/home/darwin/DB/SwissProt'):
end:
job := proc()
sequ := SearchTag('SEQ',Entry(entry)):
sequence := staring(SearchSeqDb(sequ)):
sequence;
end:
Handler := proc(job:integer,t:string)
OpenAppending('Job.results');
printf('%s ',t):
end:
Before running the command, the queue which parameterizes each job, the
machines to be used and the control strings must be defined.
Examples:
> Machines:=['linneus1:2','linneus2:3'];
> Controls := 'OffHours linneus1 8..9
MaxJobs linneus2 4
';
> queue := [seq(sprintf('entry := %d:',i),i=1..10)];
> ParExecuteIPC(queue,'ParExample',Machines,Handler,10,Controls);
See Also:
?ConnectTcp ?ipcsend ?ReceiveDataTcp ?SendTcp
?darwinipc ?ParExecuteSlave ?ReceiveTcp
?DisconnectTcp ?ParExecuteTest ?SendDataTcp
ParExecuteTest
Function ParExecuteTest
Calling Sequence: ParExecuteTest(thisjob,ProgFileName,machine)
Parameters:
Name Type Description
--------------------------------------------------------------------------------
thisjob {string,structure} statements parameterizing one job
ProgFileName string File name containing init and job procedures
machine string machine to be used
Returns:
string
Global Variables: job
Synopsis: ParExecuteTest tests thisjob using the prog in ProgFileName
simulating a ParExecuteIPC on machine. It is designed to test the setup of a
ParExecuteIPC before running it on multiple machines.
For the following example, first create a file with the name "ParExample".
In that file, define procedures with the names init and job. These procedure
names are not optional. Here is the contents of the file "ParExample":
init := proc()
ReadDb('/home/darwin/DB/SwissProt'):
end:
job := proc()
sequ := SearchTag('SEQ',Entry(entry)):
sequence := staring(SearchSeqDb(sequ)):
sequence;
end:
Examples:
> queue := [seq(sprintf('entry := %d:',i),i=1..10)]:;
> ParExecuteTest(queue[2],'ParExample',linneus2):;
Warning: procedure Handler reassigned
May 20 13:16:10 2003: linneus2 creates parallel process
May 20 13:16:10 2003: linneus2(19680) started
May 20 13:16:10 2003: linneus2(19680) initialized (0.0 s CPU)
May 20 13:16:10 2003: linneus2(19680) started job
May 20 13:16:10 2003: linneus2(19680) completed job (0.0 s CPU), result:
MASVKSSSSSSSSSFISLLLLILLVIVLQSQVIECQPQQSCTASLTGLNVCAPFLVPGSPTASTECCNAVQSINHDCMC
May 20 13:16:10 2003: linneus2(19680) ending
May 20 13:16:10 2003: linneus2(19680) ended
See Also:
?ConnectTcp ?ipcsend ?ReceiveDataTcp ?SendTcp
?darwinipc ?ParExecuteIPC ?ReceiveTcp
?DisconnectTcp ?ParExecuteSlave ?SendDataTcp
Paragraph
Class Paragraph - holds contents of a paragraph of text
Template: Paragraph(content1,...)
Paragraph(indent,content1,...)
Returns:
Paragraph
Fields:
Name Type Description
------------------------------------------------------------------
indent integer the integer indentation value
content_i {string,structure} the text content of the Paragraph
Methods: HTMLC LaTeXC Paragraph_type print string
Synopsis: The Paragraph structure holds text that is expected to be laid out
as a paragraph. The integer value indent specifies the number of blank
positions to be added at the beginning of the first line. If the indent
value is negative, then the first line is not indented, but the rest of the
lines will be indented by -indent. Paragraphs are typically part of
Documents or Descriptions or any place where text must be formatted. When a
Paragraph is converted to a string, each content_i is converted to a string,
all concatenated together and properly broken into lines not exceeding the
value of the interface variable screenwidth. A newline character is always
added at the end of the last line of the converted Paragraph. Any newlines
or tab characters in the contents are changed into spaces.
Examples:
> p := Paragraph( 5, 'This text is indented 5 spaces' );
p := Paragraph(5,This text is indented 5 spaces)
> print(p);
This text is indented 5 spaces
See Also:
?Block ?Document ?latex ?RunDarwinSession
?Code ?HTML ?List ?screenwidth
?Color ?HyperLink ?PostscriptFigure ?Table
?Copyright ?Indent ?print ?TT
?DocEl ?LastUpdatedBy ?Roman ?View
ParallelAllNucPepMatches
Function ParallelAllNucPepMatches
Option: builtin
Calling Sequence: ParallelAllNucPepMatches(npm,dm,goal)
Parameters:
Name Type Description
----------------------------------------------------------------
npm list(NucPepMatch) a Nucleotide Peptide Match
dm {DayMatrix,list(DayMatrix)} Dayhoff matrix or matrices
goal numeric threshold value
Returns:
NULL
Synopsis: Does multiple GetAllNucPepMatches simultaneously. More efficient
than single GetAllNucPepMatches calls on some parallel machines only.
Examples:
See also: ?GetAllNucPepMatches ?NucPepMatch
ParseDimacsGraph
Function ParseDimacsGraph
Calling Sequence: ParseDimacsGraph(s)
Parameters:
Name Type Description
--------------------------------------
s string graph in dimacs format
Returns:
Graph
Synopsis: This function parses a graph in dimacs format and returns a Darwin
graph structure.
Examples:
> ParseDimacsGraph('p edge 5 2
e 2 4 9
e 1 5 2
');
Graph(Edges(Edge(9,2,4),Edge(2,1,5)),Nodes(1,2,3,4,5))
See Also:
?BipartiteGraph ?Graph_minus ?Nodes
?Clique ?Graph_Rand ?Path
?DrawGraph ?Graph_XGMML ?RegularGraph
?Edge ?InduceGraph ?ShortestPath
?EdgeComplement ?MaxCut ?TetrahedronGraph
?Edges ?MaxEdgeWeightClique ?VertexCover
?FindConnectedComponents ?MinCut
?Graph ?MST
ParseNewickTree
Function ParseNewickTree - Converts a tree from newick to darwin format
Calling Sequence: ParseNewickTree(t)
ParseNewickTree(t,modif)
Parameters:
Name Type Description
-----------------------------------------------------------------------------
t string tree in newick format
modif symbol = procedure (optional) modifier procedures for label parsing
Returns:
Tree
Synopsis: The function converts a tree from Newick (and also New Hampshire
eXtended) format to a Darwin tree. Multifurcated nodes will be resolved to a
binary representation with in-between-branches of length 0.
Possible modifier for label parsing:
'InternalLabels'=procedure A function string->anything that is called
with any label assigned to an internal node. The return value is
stored in the 'xtra' field of the node. The default handler return
NULL for empty labels (such that no 'xtra' field is created), the
content of the 'NHX'-Tag (see References) or else the label itself.
'LeafLabels'=procedure A function string->anything that is called
with 'NHX'-Tags assigned to leaves of the tree. The return value is
stored in the 3rd field of the Leaf data structure. The default
handler returns the content of those tags.
'defaultBranchLength'=nonnegative If only the topology of the tree is given,
one can set a default length. N-ary inner nodes will can be
preserved that way, as they get all the same height assigned.
References: Newick format according to Olson Grammar: http://evolution.
genetics.washington.edu/phylip/newicktree.html Description of New Hampshire
extension (vers 2.0): http://www.phylosoft.org/forester/NHX.html
Examples:
> t := '(((A:0.2,B:0.3):0.3,(C:0.5,D:0.3):0.2):0.3,E:0.7):0.0;';
t := (((A:0.2,B:0.3):0.3,(C:0.5,D:0.3):0.2):0.3,E:0.7):0.0;
> ParseNewickTree(t);
Tree(Tree(Tree(Leaf(A,0.8000),0.6000,Leaf(B,0.9000)),0.3000,Tree(Leaf(C,1),0.5000,Leaf(D,0.8000))),0,Leaf(E,0.7000))
See also: ?Leaf ?LeastSquaresTree ?PhylogeneticTree ?Tree
ParsePred
Function ParsePred( MulAlign:array(string), tree )
Generates the prediction of parse regions in a multiple alignment
PartialFraction
Function PartialFraction
Calling Sequence: PartialFraction(r)
PartialFraction(r,eps)
Parameters:
Name Type
----------------------------------------------------------
r a numerical value
eps optional, the desired accuracy of the approximation
Returns:
[integer, posint] : a rational number represented by two integers, p/q
Synopsis: PartialFraction computes an approximation of the input value r as a
rational number. The pair of integers returned, p,q, should be interpreted
as a rational approximation of r, i.e. p/q=r. The second argument must be
a positive argument. The computed approximation will have an error of the
same order of magnitude as eps or smaller. If eps is omitted the value 1e-5
is used.
Examples:
> PartialFraction(1.234567);
[100, 81]
> PartialFraction(-Pi,0.01);
[-22, 7]
Partitions
Data structure Partitions( )
Function: creates a splits or partitions data structure
Selectors:
Tree: Creates a tree from the given partitions
If the partitions cause conflicts, then VertexCover is used
to remove the conflicts and then a tree is constructed.
Conflicts: Returns a reduced Partitions set that is free of conflicts (VertexCover)
MinSquare: Uses the probabilistic model to create a tree. This is useful if a tree
should be constructed but there are still conflicts in the graph. If you
do not want to use VertexCover to remove the conflicts then this is
an alternative. This way a minimum sqare tree is produced.
Partitions_GetConflicts
Function Partitions_GetConflicts( )
returns a list of sets. If the set is not empty, it specifies
the conflict with another set. The number in the set is the other
conflicting set in the list
Partitions_GetTree
Function Partitions_GetTree( )
Constructs a binary tree form a set of partitions
PRECONDITIONS:
the partitions must be conflict free and there
must be enough partitions (n-2, n = nr of leaves) to construct a complete tree
Partitions_ResolveConflicts
Function Partitions_ResolveConflicts( )
Data is a list of partitions (list of sets). The procedure
finds the conflicts, creates a graph and uses VertexCover to resolve
the conflicts.
The result a reduced list of sets that does not contain the conflicting sets.
PatEntry
Class PatEntry - Data structure for entries to the Pat index for the database
DB
Template: PatEntry(a)
Fields:
Name Type Description
-----------------------------------------------------------------------------------
a {integer,range,string,list(integer)} PatEntry number(s) in the database DB
or a string to be searched
Returns:
PatEntry
Methods: AC Entry ID Match PatEntry_type print Sequence string
Synopsis: When a Darwin database is read for the first time, Darwin will
automatically create a Patricia tree data structure from the contents of the
SEQ field for each entry. This is accessed via a Pat index. PatEntry is a
data structure for entries to the Pat index for the database DB. If the
argument is an integer, a list of integers or a range of integers, these are
considered to be entries in the Pat index of the database. If a string is
given, it is assumed to be a sequence, and the Pat index is searched for all
the sequences which contain the string exactly. The result is returned as a
range, even in the case that it is not found (an empty range) which is
useful as it points to the two closest neighbouring sequences in the
database. Searching for exact identity of peptides using PatEntry is very
fast.
Examples:
> PatEntry(1);
PatEntry(1)
> PatEntry(1..5);
PatEntry(1..5)
> PatEntry('HHHHHHHH');
PatEntry(19663305..19663628)
> PatEntry('B');
PatEntry(4667614..4667613)
> PatEntry('C');
PatEntry(4667614..5600515)
> Sequence(PatEntry(CCCCCCCC));
CCCCCCCCCCCNFCCGKFKPPVNESHDQYSHLNRPDGNREGNDMPTHLGQPPRLEDVDLDDVNLGAGGAPVTSQPREQAGGQPVFAMPPPSGAVGVNPFTGAPVAANENTSLNTTEQTTYTPDMVNQKY, CCCCCCCCCCNFCCGKFKPPVNESHDQYSHLNRPDGNREGNDMPTHLGQPPRLEDVDLDDVNLGAGGAPVTSQPREQAGGQPVFAMPPPSGAVGVNPFTGAPVAANENTSLNTTEQTTYTPDMVNQKY, CCCCCCCCCNFCCGKFKPPVNESHDQYSHLNRPDGNREGNDMPTHLGQPPRLEDVDLDDVNLGAGGAPVTSQPREQAGGQPVFAMPPPSGAVGVNPFTGAPVAANENTSLNTTEQTTYTPDMVNQKY, CCCCCCCCLCRDSCVSTWTKNSVANAVATNASSEVSIYSGSFLAILCTFSTGNLGEHRGADAVSLPLVSLFIVLA, CCCCCCCCNFCCGKFKPPVNESHDQYSHLNRPDGNREGNDMPTHLGQPPRLEDVDLDDVNLGAGGAPVTSQPREQAGGQPVFAMPPPSGAVGVNPFTGAPVAANENTSLNTTEQTTYTPDMVNQKY
See also: ?Entry ?ID ?Match ?Sequence ?string
Path
Function Path - find a path between two nodes of a graph
Calling Sequence: Path(g,n1,n2)
Parameters:
Name Type Description
-------------------------------
g Graph given graph
n1 Node source node
n2 Node destination node
Returns:
list(Edge)
Synopsis: Find a path between n1 and n2, returning all the edges that need to
be traversed in a list. If there is no path, it returns an empty list.
Examples:
> g := Graph( Edges(Edge(1.2,1,2),Edge(2,1,4),Edge(3,1,5),Edge(4,2,3),Edge(5,3,4)),Nodes(1,2,3,4,5));
g := Graph(Edges(Edge(1.2000,1,2),Edge(2,1,4),Edge(3,1,5),Edge(4,2,3),Edge(5,3,4)),Nodes(1,2,3,4,5))
> Path(g,3,5);
[Edge(4,2,3), Edge(1.2000,1,2), Edge(3,1,5)]
See Also:
?BipartiteGraph ?Graph_minus ?Nodes
?Clique ?Graph_Rand ?ParseDimacsGraph
?DrawGraph ?Graph_XGMML ?RegularGraph
?Edge ?InduceGraph ?ShortestPath
?EdgeComplement ?MaxCut ?TetrahedronGraph
?Edges ?MaxEdgeWeightClique ?VertexCover
?FindConnectedComponents ?MinCut
?Graph ?MST
PerIdentToPam
Function PerIdentToPam - Compute PAM distance from percentage identity
Calling Sequence: PerIdentToPam(p)
Parameters:
Name Type Description
------------------------------------
p numeric percentage identity
Returns:
numeric
Synopsis: Compute the PAM distance which results in the given percentage
identity.
Examples:
> PerIdentToPam(17);
289.5953
See also: ?PamToPerIdent
Permutation
Class Permutation - a mathematical permutation
Template: Permutation(p)
Permutation(n)
Fields:
Name Type Description
---------------------------------------------------------------
p list(posint) list of integers from 1 to n
n posint creates an identity permutation of size n
Returns:
Permutation
Methods: Permutation_type power Rand string times
Synopsis: A Permutation holds a list of consecutive positive integers which
describe how to permute a set of size n. Permutations can be multiplied
(the product of two permutations a * b is a permutation which is identical
to applying b and then a). Permutations can also be powered, in particular
an inverse permutation is obtained by 1/a.
Examples:
> a := Rand(Permutation(7));
a := Permutation([4, 5, 6, 2, 3, 1, 7])
> b := Rand(Permutation(7));
b := Permutation([5, 6, 1, 2, 4, 7, 3])
> a*b;
Permutation([2, 4, 7, 6, 1, 5, 3])
> 1/a;
Permutation([6, 4, 5, 1, 2, 3, 7])
See also: ?CreateRandPermutation ?Mutate ?Rand ?Shuffle
PhyML
Function PhyML - Wrapper for PhyML, a ML tree reconstruction tool
Calling Sequence: PhyML(msa)
Parameters:
Name Type Description
----------------------------------------------------------------------------------------------------------
msa {MAlignment,list(string)} Multiple Sequence Alignment
labels list(string) (optional) Sequence Labels
subst string (optional) substitution model
inv_sites inv_sites=boolean (optional) Estimate invariant sites
gamma_dist gamma_dist={'e',positive (optional) Use or estimate gamma parameter
rate_cats rate_cats={numeric} (optional) number of discrete rate categories
inv_sites inv_sites=boolean (optional) Estimate invariant sites
start_tree start_tree={Tree,string} (optional) start tree for search
nr_bootstrap nr_bootstrap={numeric} (optional) number of bootstrap samples
opt_topo opt_topo=boolean (optional) optimize topoplogy
opt_branch opt_branch=boolean (optional) optimize branchlengths
seqtype seqtype=string (optional) type of sequences (default AA)
search_heuris search_heuris=string (optional) applied search heuristics
LnLperSite LnLperSite=boolean (optional) Report log-likelihood values per site? (default NO)
Returns:
TreeResult
Synopsis: PhyML is a tool to compute maximum likelihood trees from multiple
sequence alignments. For details see manual (Reference section).
Available substitution models:
JTT
subst=string HKY85,JC69,K80,F81,F84,TN93, GTR,LG,WAG,JTT,MtREV,
Dayhoff,DCMut,RtREV,CpREV,VT, Blosum62,MtMam,MtArt,HIVw,
HIVb
seqtype={AA,DNA} specify type of sequence data. By default, Amino Acid is
assumed
Available model modifiers:
inv_sites={'e', 0..1} estimate (e) or set proportion of invariant
sites to a fixed value.
gamma_dist={'e',positive} estimate (e) or set the gamma rate parameter.
rate_cats=integer number of discrete rate categories
Other parameters:
nr_bootstrap=integer determines the amount of bootstrap samples to
be evaluated. Default=0
start_tree={Tree,'MP','BioNJ'} specifies the start topology for the ML
search. 'MP' uses a maximum parsimony tree and
'BioNJ' starts with a Neighbor-Joining tree.
Alternatively, you can pass a starting
topology. Default='MP'
search_heuris={NNI,SPR,BEST} specifies the applied seach heuristics. By
default, NNI is used.
opt_topo=boolean specifies, whether or not the topology is
optimized.
opt_branch=boolean specifies, whether or not the branchlengths are
optimized.
References: Guindon S., Gascuel O. A simple, fast, and accurate algorithm to
estimate large phylogenies by maximum likelihood, Systematic Biology,
52(5):696-704, 2003.
Examples:
> msa := Rand(MAlignment):;
> PhyML(msa, 'subst'='LG','inv_sites'='e');
TreeResult(Tree(Tree(Leaf(RandSeq7, 1e-10),0,Leaf(RandSeq9,0.2974)),0,Tree(Leaf(
RandSeq6,0.2229),0.2173,Tree(Tree(Tree(Leaf(RandSeq5,1.0211),1.0162,Leaf(
RandSeq8,1.3645)),0.7386,Tree(Tree(Leaf(RandSeq3,1.0698),1.0698,Leaf(RandSeq10,
1.5540)),0.7504,Leaf(RandSeq1,0.7580))),0.4718,Tree(Leaf(RandSeq2,0.4878),0.4878
,Leaf(RandSeq4,0.6897))))),ML,table([{[Likelihood, [-3094.8008]]}, {}, {}, {[
InvSites, [0.00500000]]}, {}, {[SubstModel, [LG]]}, {[CPUtime, [12.4500]]}, {},
{[Method, [Phyml 3.0]]}, {[Alpha, [99.8640]]}, {}, {}, {}],unassigned))
See Also:
?LeastSquaresTree ?MAlign ?RellTree ?Tree
?MafftMSA ?PhylogeneticTree ?RobinsonFoulds ?TreeResult
PhylogeneticTree
Function PhylogeneticTree - Constructs Phylogenetic Trees
Calling Sequence: PhylogeneticTree(Seqs,Ids,Mode)
Parameters:
Name Type Description
---------------------------------------------------------------------------
Seqs list Sequences or Entries from which a tree is built
Ids {list,procedure} list of id tags or procedure that produces tags
Mode symbol method - DISTANCE, PARSIMONY or LINEAGE
msa MAlignment optional Multiple sequence alignment
allall matrix optional all vs all matrix of Alignments
Returns:
Tree
Global Variables: DimensionlessFit MST_Qual printlevel
Synopsis: PhylogeneticTree is a method for constructing phylogenetic trees
using either minimization of the least squares of the distances in the real
data and computed tree or by minimizing the number of changes/mutations that
would be required
If the mode passed is DISTANCE, an all-against-all (each sequence aligned
against each other sequence) is calculated and the distance and variance
information is used to compute a binary tree which approximates via least
squares the distance information. If an optional array of Alignment data
structures is passed as an argument, this all-against-all will be used
instead of recalculating it. Ten trees are constructed from random starting
points and the best tree is returned. All trees are optimized using
iterations of 4-optim and 5-optim which optimize all subtrees with 4 and 5
branches respectively. The quality of the fit is measured by the sum of the
squares of the weighted deviations divided by (n-2)(n-3)/2. This value is
stored in the global variable MST_Qual. If the global variable MinLen is
assigned a positive value, it will determine the minimum length between
internal or external nodes. If not set, 0.1 PAM is used. The distance of
the branches are the approximate distances calculated by least squares in
PAM units. Since the tree is made from alignments, the input sequences must
be protein or DNA sequences.
If the mode passed is PARSIMONY, random trees are constructed and then
optimized with 4-and 5-optim using the parsimony criterion (the tree with
the least amount of mutations is the best tree). This is sometimes also
called character compatibility. Each position of the given sequences is
treated as a character. The goal of the parsimony trees is to build a tree
such that we can assign character changes on the branches of the tree and
this total number of changes is minimized. Amino acids or DNA bases can be
used as characters, but also any other arbitrary symbol (characters are
restricted to be ASCII characters though). If a MAlignment data structure
is passed as an optional argument, this alignment is used. If all the
sequences are exactly the same length, it is assumed that they have been
already aligned and they are taken as given. If not, the sequences in Seqs
are aligned with the circular tour method (See ?MAlign). The global
variable MST_Qual is assigned the number of changes that the returned tree
requires. The distances in the tree are taken from the parsimony
construction and indicate the minimum number of changes that must occur in
that particular branch. The Parsimony method accepts an additional
parameter which indicates which method to use to build the initial tree.
This tree is later optimized. The methods to build the initial tree are:
NJRandom Neighbour Joining with randomness in the selection of the
best pair to join.
CircularTour A circular tour of minimum cost is built at each step, and
the pair of nodes with least cost is selected to be joined.
NeighJoin Neighbour Joining. At each step the two subtrees with the
least cost to join them are joined.
DynProgr(k) Use a dynamic programming approach among the k best results
of Neighbour Joining.
DynProgr Identical to DynProgr(10)
OptInsertion Insert each leaf/subtree in the best possible branch of the
previously built subtrees. This is the default choice, it is
a bit slow, but normally gives the best trees.
Random Leaves/subtrees are joined randomly. Quite fast, but
produces poor trees.
LowerBound Do not build a tree, just compute a lower bound on the cost
of the tree (minimum number of changes).
SemiOptInsertion(t) Like OptInsertion, but limit the search of the best
insertion to t seconds.
SemiOptInsertion Synonym of SemiOptInsertion(10).
If the mode passed is StrictCharacterCompatibility, then it is assumed that
the Seqs are strings (all of the same lengths) of binary characters. Any
symbols can be used for the characters. If the characters are not
compatible, an error is given with the first pair of characters which are
not compatible. The global variable MST_Qual will contain the minimum
number of character changes, which is equal to the number of informative
characters (and never greater than the length of the sequences of
characters).
If the mode passed is LINEAGE, then it is assumed that the Seqs are lists
containing lineage descriptions. The lists are assumed to classify each
sequence from the most general to the most specific class. The lineage
descriptions have to be consistent, that is if a particular class is used,
then it should always be preceded with the same sequence of classes. The
classes are typically strings, but could be any valid Darwin object.
Examples:
> Ids := ['one','two','three','four']:
> Seqs := ['RTHKLPEMNVC', 'KSHKLPEMNVC', 'SHKLMNVC', 'HKLPEMNVC']:
> PhylogeneticTree(Seqs,Ids,DISTANCE);
> MST_Qual;
0.01116240
> PhylogeneticTree(Seqs,Ids,PARSIMONY);
Tree(Tree(Leaf(one,2.5000,1),0.5000,Leaf(four,1.5000,4)),0,Tree(Leaf(three,2.5000,3),0.5000,Leaf(two,1.5000,2)))
> Seqs := [B1xj,B2zj,G2zi,G1xi,G2xi]:
> PhylogeneticTree(Seqs,[seq(i,i=1..5)],parsimony);
Tree(Tree(Tree(Leaf(1,3.5000,1),1.5000,Leaf(4,1.5100,4)),0.5000,Leaf(5,0.5100,5)),0,Tree(Leaf(2,2.5000,2),0.5000,Leaf(3,0.5100,3)))
> MST_Qual;
6
See Also:
?BootstrapTree ?Leaf ?SignedSynteny
?ComputeDimensionlessFit ?LeastSquaresTree ?Synteny
?DrawTree ?MAlignment ?Tree
?Entry ?RBFS_Tree ?Tree_matrix
?GapTree ?Sequence
Plot2Gif
Function Plot2Gif - convert a plot output to a gif file
Calling Sequence: Plot2Gif(opt)
Parameters:
Name Type Description
-----------------------------------------------------------------------
opt 'landscape' (optional) produce the gif in landscape format
opt 'portrait' (optional) produce the gif in portrait format
opt output = string (optional) file name to place the result
Returns:
NULL
Synopsis: Uses underlying unix/linux commands to convert the output of a
Draw/Plot command to a xxx.gif file. The commands used are pstopnm and
ppmtogif and may not exist in all versions of the operating systems.
Examples:
> Plot2Gif( landscape, output='figure1.gif' );
See Also:
?BrightenColor ?DrawPlot ?Set
?ColorPalette ?DrawPointDistribution ?SmoothData
?DrawDistribution ?DrawStackedBar ?StartOverlayPlot
?DrawDotplot ?DrawTree ?StopOverlayPlot
?DrawGraph ?GetColorMap ?ViewPlot
?DrawHistogram ?PlotArguments
PlotArguments
Class PlotArguments - structure to hold plotting/drawing options
Template: PlotArguments(Title,TitleX,TitleY,TitlePts,Lines,Grid,LabelFormat,
GridFormat,Colors,Axis)
Fields:
Name Type Description
--------------------------------------------------------
Title string text to be displayed in the plot
TitleX numeric x coordinate of the title
TitleY numeric y coordinate of the title
TitlePts numeric point size of the title
Lines boolean
Grid boolean
LabelFormat string
GridFormat string
Colors string colour map
Axis boolean axis will be drawn
Returns:
PlotArguments
Methods: draw PlotArguments_type Title
Synopsis: Structure to hold plot options. This structure is used internally
by several drawing functions. The way of filling the values is uniform for
all the functions, and these accept the values in the following format:
Title = string text to be displayed in the plot
TitleX = numeric x coordinate of the title
TitleY = numeric y coordinate of the title
TitlePts = numeric point size of the title
Lines = boolean draw horizontal lines
Grid = boolean draw a grid (horizontal and vertical lines
LabelFormat = string printf-style format for labels
GridFormat = string printf-style format for Lines or Grid values
Colors = list list of colors suitable for GetColorMap
Axis = boolean draw x and y axes
See Also:
?BrightenColor ?DrawPlot ?Set
?ColorPalette ?DrawPointDistribution ?SmoothData
?DrawDistribution ?DrawStackedBar ?StartOverlayPlot
?DrawDotplot ?DrawTree ?StopOverlayPlot
?DrawGraph ?GetColorMap ?ViewPlot
?DrawHistogram ?Plot2Gif
PlotIndex
Function PlotIndex - Plot a Variation Index
Calling Sequence: PlotIndex(ma)
Parameters:
Name Type Description
----------------------------------------------------
ma array(string) multiple sequence alignment
index array(numeric) a variation index
Returns:
NULL
Synopsis: Plots a histogram from the variation index.
See also: ?KWIndex ?PrintIndex ?ProbIndex ?ScaleIndex
Poisson_Rand
Function Poisson_Rand - Generate random Poisson-distributed integers
Calling Sequence: Rand(Poisson(m))
Returns:
integer
Synopsis: This function returns a random Poisson-distributed integer with
average m and variance m. The Poisson distribution is the limiting case of
the binomial distribution when n -> infinity and n*p=m remains bounded. In
mathematical terms, the probability that the outcome is i is exp(-m) * m^i /
i! (for 0 <= i). Poisson_Rand uses Rand() which can be seeded by either the
function SetRand or SetRandSeed.
References: Handbook of Mathematical functions, Abramowitz and Stegun,
26.1.22
Examples:
> Rand(Poisson(20));
12
> Rand(Poisson(1000));
979
See Also:
?Beta_Rand ?Exponential_Rand ?Multinomial_Rand ?StatTest
?Binomial_Rand ?FDist_Rand ?Normal_Rand ?Std_Score
?ChiSquare_Rand ?GammaDist_Rand ?SetRand ?Student_Rand
?CreateRandSeq ?Geometric_Rand ?SetRandSeed ?Zscore
?Cumulative ?Graph_Rand ?Shuffle
Polar
Data structure Polar( Rho:numeric, Theta:numeric )
Data structure Polar( Rho, Theta )
Representation of complex numbers in polar form. The number is
Rho * exp( i*Theta ).
- Operations:
Initialization: a := Polar(1,Pi/2);
b := Polar(0,1);
All arithmetic operations:
a+b, a-b, a*b, a/b, a^b, |a|
Special functions exp(a), ln(a), sin(a), cos(a), tan(a)
Printing: print(a);
printf( '%.3f', a );
Type testing: type(a,Polar);
- Conversions:
To string : string(a)
Complex : Complex(a)
Polar : Polar(Complex(...))
- Selectors:
a[Re] : real part
a[Im] : imaginary part
a[Rho] : radius or absolute value
a[Theta] : angle, (-Pi < a[Theta] <= Pi)
PolishAngles
Function PolishAngles( g:Graph, angles:array(numeric) )
Attempts to polish angles by collapsing g to a tree.
PositionTree
Function PositionTree( ma:array(string), t:Tree, pos:posint )
Creates a tree containing the amino acids of position pos in ma as labels.
PostscriptFigure
Class PostscriptFigure - figure given by a postscript file (Darwin or other)
Template: PostscriptFigure()
Fields:
Name Type Description
---------------------------------------------------------------------------
psfile string (opt) file name containing the postscript
caption Caption = string (opt) caption to describe the figure
convmeth Convert = string (opt) conversion method
linkas LinkAs = string (opt) path of image source in HTML
newfn PlaceUnder = string (opt) name of converted image file
modif string = string (opt) pattern substitutions for input file
Returns:
PostscriptFigure
Methods: HTMLC LaTeXC PostscriptFigure_type Rand string
Synopsis: A PostscriptFigure object is constructed from a postscript file
which could be generated by a Darwin Draw command or from some other source,
e.g. xfig. This structure is normally held in a Document and is displayed
as appropriate (as HTML, latex or a string). If no psfile is given, it is
assumed that it comes from a Draw command and hence plotoutfile is used.
When this structure is converted to HTML, a .gif or .jpg file has to be
made. The default method is 'auto' which will use the UNIX tool 'convert'
to automatically create a .jpg file without user interaction. If this does
not lead to satisfying results or some modifications (e.g. rotation) has to
be performed, the method 'gimp' should be used. This will open the file in
Gimp and gives control to the user. Hence Gimp has to be available in the
system. The LinkAs option allows linking the file under a different path
when converting to HTML. With PlaceUnder a filename for the converted file
can be given. This filename also determines the image format (.gif or .jpg).
If it is converted to latex, the postscript is converted to encapsulated
postscript with ps2eps, which should also be available. Conversion to a
string just prints a box with a unix command suitable to display the
contents.
The modifiers are a simple mechanism to modify previously created postscript
files. Textual substitution will be performed (length issues are ignored,
and most of the time they work well). These substitutions should be based
on a relatively unique pattern, short patterns that may coincide with other
postscript commands are bound to be disastrous.
Examples:
> PostscriptFigure( 'PAMgraph.ps', Caption='Score vs PAM');
PostscriptFigure(PAMgraph.ps,Caption = Score vs PAM,Convert = auto,PlaceUnder = PAMgraph.jpg,LinkAs = PAMgraph.jpg)
See Also:
?Block ?Document ?latex ?RunDarwinSession
?Code ?HTML ?List ?screenwidth
?Color ?HyperLink ?Paragraph ?Table
?Copyright ?Indent ?print ?TT
?DocEl ?LastUpdatedBy ?Roman ?View
PredictGenes
Function PredictGenes( ms:list(NucPepMatch) )
Predict the best disjoint genes implied by ms. All matches in ms must
refer to the same nucleotide sequence. Returns
genes: list([cds: list(posint..posint), simil: numeric, nr: set]),
exons: list(Region),
introns: list(Region).
PrintIndex
Function PrintIndex - Prints a Variation Index
Calling Sequence: PrintIndex(ma,index)
Parameters:
Name Type Description
----------------------------------------------------
ma array(string) multiple sequence alignment
index array(numeric) a variation index
Returns:
NULL
Synopsis: Prints the multiple alignment, followed by the indices, one
position per row.
Examples:
> ma := [ 'AKQVVLLIFGSW', 'AEPIVPLLFGMW', 'AEVIVPLLFGVW',
'AEPIVPLLFGLW', ' EPIVPLL__MW', ' PIVPLLFGMW']:
> tree := Tree(Tree(Leaf(3,-50.3881,c),-31.1550,Tree(Tree(
Leaf(2,-52.2087,b),-50.4844,Tree(Leaf(6,-71.9795,f),-53.3023,
Leaf(5,-92.0774,e))),-41.0671,Leaf(4,-48.3231,d))),0,Leaf(1,-62.9954,a)):
> prxd := ProbIndex (ma, tree);
prxd := [1.5749, 3.0664, 6.2335, 3.1332, 2.1029, 3.9343, 1.6915, 2.9950, 2.0193, 1.5307, 6.9936, 2.2708]
> PrintIndex(ma,prxd);
1 AAAA 1.57
2 KEEEE 3.07
3 QPVPPP 6.23
4 VIIIII 3.13
5 VVVVVV 2.10
6 LPPPPP 3.93
7 LLLLLL 1.69
8 ILLLLL 2.99
9 FFFF_F 2.02
10 GGGG_G 1.53
11 SMVLMM 6.99
12 WWWWWW 2.27
See also: ?KWIndex ?PlotIndex ?ProbIndex ?ScaleIndex
PrintInfo
Function PrintInfo( entries:{integer,structure}, tag1:string )
Print the entry number and information tags (tag1 and additional optional
tags) for an entry given by number or several entries given by a data
structure.
PrintMatrix
Function PrintMatrix
Calling Sequence: PrintMatrix(A,format)
Parameters:
Name Type
----------------------------------------------------
A a rectangular or square matrix
format optional, a formatting string, as in printf
Returns:
NULL
Synopsis: This function pretty-prints a square or rectangular matrix. It is
normally used by the print() command. If called directly, the user can
specify the format to be used. Without a printing format, it will calculate
a reasonable format to fit on the screen width.
Examples:
> PrintMatrix( [[1,2], [3,4]] );
1 2
3 4
> PrintMatrix( [[1/7,2/7], [3/7,4/7]], '%13.10f');
0.1428571429 0.2857142857
0.4285714286 0.5714285714
See also: ?print ?printf (for the codes accepted as format)
PrintStringMatch
Function PrintStringMatch( pat:string, t:string )
Print the alignment of a string (pat) matched against a text (t).
PrintTreeSeq
Function PrintTreeSeq( t:Tree )
Print out sequences cross referenced in a tree.
ProbAncestor
Function ProbAncestor
Calling Sequence: ProbAncestor(ps1,ps2,d1,d2)
ProbAncestor(ps1,ps2,d1,d2,lnM,freq)
Parameters:
Name Type Description
--------------------------------------------------------------
ps1, ps2 ProbSeq Probabilistic sequences
d1, d2 numeric Distances to the common ancestor
lnM matrix(numeric) (optional) log. of a 1-PAM matrix
freq array(numeric) (optional) character frequencies
Returns:
ProbSeq
Global Variables: LogLikelihoods
Synopsis: Given two probabilistic sequences and the distances to their common
ancestor, this function computes the probabilistic ancestral sequence (PAS).
The logarithm of a 1-PAM matrix is needed to compute the mutation matrices
for the two distances. The mutation matrix NewlogPAM1 is the default value
and can be used for amino acid sequences. For codon sequences CodonLogPAM1
is recommended. The ancestral probabilities depend on the natural
frequencies of the characters. By default, the amino acid frequencies AF are
used. The global variable LogLikelihoods will be assigned to an array
containing the ln of the likelihoods at each position.
References: GM Cannarozzi, A Schneider and GH Gonnet (2007): Probabilistic
Ancestral Sequences Based on the Markovian Model of Evolution - Algorithms
and Applications, in: D Liberless (editor): Ancestral Sequence
Reconstruction, Oxford University Press.
Examples:
> ps1 := ProbSeq('AARV',IntToA):
> ps2 := ProbSeq('AVVV',IntToA):
> pas := ProbAncestor(ps1,ps2,10,10):
> print(pas);;
pos Most probable chars
1 A 1.00 S 0.00 V 0.00 G 0.00 T 0.00
2 A 0.56 V 0.43 L 0.00 T 0.00 I 0.00
3 V 0.56 R 0.34 A 0.02 L 0.02 K 0.02
4 V 1.00 I 0.00 L 0.00 A 0.00 T 0.00
See Also:
?CreateCodonMatrices ?PASfromMSA ?ProbSeq
?CreateDayMatrices ?PASfromTree ?PSDynProg
ProbBallsBoxes
Function ProbBallsBoxes - probability of hitting k eps-boxes with n balls
Calling Sequence: ProbBallsBoxes(k,n,eps)
Parameters:
Name Type Description
----------------------------------------------------------
k posint number of boxes
n posint number of balls randomly thrown in [0,1]
eps positive 0 ProbBallsBoxes(3,10,0.0001);
7.1924e-10
See Also:
?Cumulative ?DigestWeights ?MassProfileResults ?StatTest
?DigestAspN ?DynProgMass ?OutsideBounds ?Std_Score
?DigestionWeights ?DynProgMassDb ?ProbCloseMatches
?DigestSeq ?enzymes ?SearchMassDb
?DigestTrypsin ?lnProbBallsBoxes ?Stat
ProbCloseMatches
Function ProbCloseMatches - prob of k eps-close matches among U(0,1) values
Calling Sequence: ProbCloseMatches(k,n1,n2,eps)
Parameters:
Name Type Description
-----------------------------------------------------------------
k posint number of matches
n1 posint number of points randomly thrown in [0,1]
n2 posint number of points randomly thrown in [0,1]
eps positive 0 ProbCloseMatches(4,10,22,0.0001);
5.7379e-08
See Also:
?Cumulative ?OutsideBounds ?SearchMassDb ?StatTest
?DynProgMassDb ?ProbBallsBoxes ?Stat ?Std_Score
ProbDynProg
Function ProbDynProg - Probabilistic dynamic programming
Option: builtin
Calling Sequence: ProbDynProg(A,B,f,w,FixedDel,IncDel)
Parameters:
Name Type
--------------------------------
A array(array(numeric))
B array(array(numeric))
f array(numeric)
w posint
FixedDel numeric
IncDel numeric
Returns:
NULL
Synopsis: Probabilistic dynamic programming.
Examples:
See also:
ProbIndex
Function ProbIndex - Compute the Probability Index
Calling Sequence: ProbIndex(ma)
Parameters:
Name Type Description
--------------------------------------------------
ma array(string) multiple sequence alignment
t Tree a phylogenetic tree
Returns:
list(numeric)
Synopsis: Computes a variation index defined as -log10( Probability{position}
) for all positions of a multiple alignment.
Examples:
> ma := [ ' -------------------------FPEVVGKTVDQA ..(535).. CSPRKGTKT'];
ma := [ -------------------------FPEVVGKTVDQAREYFTLHYPQ , -------------------IASAGFVRDAQGNCIK--- , AKQVVLLIFGSWQLARERLANEMRKAVAY__TFL__NFDMGRQPLSMHYSDKVCSPRMSTET, AEPIVPLLFGMWRLKRKKANNKLLRCVKY__TLLARNTSDGREPVACRYSEKICSPRTGTKT, AEVIVPLLFGVWRLKREERTYTLLQCVKY__VFLARNTVAGNRPLSKKFSEKVCSPRK , AEPIVPLLFGLWQLAREKASNTLLQCVKY__VFLARNTVAGRRPLKMKYSDKVCSPRKGAKT, EPIVPLL__MWQLAIEKSSNTLLQCVK__KVFLARKTVAGRRPLSMKFSDKVCNPRKGTKT, PIVPLLFGMWQLAREKASNTLLQCVKYYYVFLARNTVAGRRPLSMKYSDKVCSPRKGTKT]
> tree := Tree(Tree(Leaf(b,-250.0000,2),-2.8422e-14, ..(272).. 00,3)))))));
tree := Tree(Tree(Leaf(Permutation([5, 6, 1, 2, 4, 7, 3]),-250,2),-2.8422e-14,Leaf(Permutation([4, 5, 6, 2, 3, 1, 7]),-250,1)),0,Tree(Leaf(h,-250,8),-209.7583,Tree(Leaf(g,-260.8121,7),-227.6537,Tree(Leaf(f,-256.9830,6),-233.8701,Tree(Leaf(d,-240.9182,4),-235.7326,Tree(Leaf(e,-252.2867,5),-237.4908,Leaf(c,-239,3)))))))
> prxd := ProbIndex (ma, tree);
prxd := [1.6978, 2.8954, 5.5145, 2.4769, 1.8613, 3.2753, 1.5187, 2.4065, 1.8195, 1.4028, 6.9273, 2.1637, 4.8046, 1.5187, 5.2698, 4.5194, 3.1698, 4.0876, 6.5461, 5.0991, 4.2424, 4.5438, 2.7156, 4.4407, 5.2065, 4.4283, 2.9725, 4.3070, 2.9907, 3.7505, 6.7152, 6.2474, 5.2650, 4.0940, 3.4090, 4.0343, 6.7206, 6.0418, 7.7801, 6.6427, 2.4979, 4.9061, 5.6526, 3.3186, 4.0506, 6.6403, 7.4820, 5.9496, 5.6263, 3.3733, 5.2957, 1.9501, 2.6816, 2.0689, 3.7597, 1.7027, 1.8456, 5.0920, 2.7307, 3.9599, 2.8101, 1.8395]
See also: ?KWIndex ?PlotIndex ?PrintIndex ?ScaleIndex
ProbSeq
Class ProbSeq - stores a generic probabilistic sequence
Template: ProbSeq(ProbVec,CharMap)
Fields:
Name Type Description
------------------------------------------------------------
ProbVec {string,array(array)} Probability vectors
CharMap procedure Character mapping function
Methods: print ProbSeq_type Sequence
Synopsis: ProbSeq stores a generic (i.e. any type of sequence - amino acid,
nucleotides, codons or others) probabilistic sequence in the form of a
probability vectors. Hence each position of the sequence is a vector giving
the probability of each possible character. The sum of the probabilities at
each position is 1 except vectors containing only zeros denoting a gap at
this position. The ProbSeq can alternatively be constructed with a sequence
as a string and a mapping function (typically one of IntToA, IntToB or
CIntToCodon). It will then automatically construct a probabilistic sequence
with a 1 for the known character and 0 otherwise. If only the probabilistic
vectors are given, the constructor tries to find the appropriate mapping
function based on the number of characters.
Examples:
> ps1 := ProbSeq('ADRIAN',IntToA);
ps1 := ProbSeq([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],IntToA)
> ps2 := ProbSeq([[.5,0,.5,0],[.3,.7,0,0]]);
ps2 := ProbSeq([[0.5000, 0, 0.5000, 0], [0.3000, 0.7000, 0, 0]],IntToB)
> print(ps2);
pos Most probable chars
1 A 0.50 G 0.50
2 C 0.70 A 0.30
See Also:
?CIntToCodon ?IntToB ?PASfromTree ?PSDynProg
?IntToA ?PASfromMSA ?ProbAncestor
Process
Class Process - structure to hold Process information
Template: Process(Pid,Job,Stopped,EventTime,JobTime)
Fields:
Name Type Description
-----------------------------------
Pid integer
Job integer
Stopped boolean
EventTime numeric
JobTime numeric
ElapsedTime string
Returns:
Process
Methods: Process_type select
Synopsis: This data structure holds information about a particular process in
a machine. The main application is for parallel processing and hence it
contains all sorts of status information.
See also: ?darwinipc ?Machine ?ParExec2
Protect
Function Protect - Protect fields from a class
Calling Sequence: Protect(classname,field1,...)
Parameters:
Name Type Description
-------------------------------------------------
classname symbol a class name to be protected
field1 symbol a field name of classname
Returns:
NULL
Global Variables: printlevel
Synopsis: Protect sets up the appropriate mechanism so that the named fields
of the class cannot be changed by any function other than the methods
already defined at the time that Protect is called. If Protect is called
without any field name, then all the fields of the data structure which have
not been protected yet, are protected. Darwin does not support the concept
of hiding at this point. That is, prevent the user from reading a value
from a field. We do not see any advantages to hiding and we do see
disadvantages to it. The protection operates at two levels. First, all
indexing references are forbidden by setting the option "NoIndexing" in the
class. Secondly, the fields mentioned are given a special name, identical
in appearance to the defined one, but different from what a user can type.
All methods referring to the class will have these names fixed
appropriately. Additional calls to Protect can be used to Protect names not
yet protected and hence create a hierarchy of protected names and functions
that can use them.
Examples:
> Protect( Polar, Rho, Theta);
See also: ?CompleteClass ?ExtendClass ?Inherit ?objectorientation ?option
PruneTree
Function PruneTree
Calling Sequence: PruneTree(t,contains)
Parameters:
Name Type Description
---------------------------------------------------------------------
t Tree
contains {list,procedure,set} labels remaining in the pruned tree
Returns:
Tree
Synopsis: This function returns a pruned version of the input tree containing
only leaves whose labels are member of the 'contains'-set / -list or for
which 'contains()' of a Leaf() structure evaluates to true respectively.
Examples:
> T := Tree( Leaf('a', 2), 0.5, Tree(Leaf('b',1.5),0.7,Leaf('e', 1)) );
T := Tree(Leaf(a,2),0.5000,Tree(Leaf(b,1.5000),0.7000,Leaf(e,1)))
> PruneTree( T, ['a','b'] );
Tree(Leaf(a,2),0.5000,Leaf(b,1.5000))
See also: ?Leaf ?RotateTree ?Tree
RAxML
Function RAxML - Wrapper for RAxML, a ML tree reconstruction tool
Calling Sequence: RAxML(msa)
Parameters:
Name Type Description
-----------------------------------------------------------------------------------------
msa {MAlignment,list(string)} Multiple Sequence Alignment
labels list(string) (optional) Sequence Labels
subst string (optional) substitution model
inv_sites inv_sites=boolean (optional) Estimate invariant sites
estimate_basefreqs estimate_basefreqs=boolean (optional) Estimate base frequencies
start_tree start_tree={Tree,string} (optional) start tree for search
nr_runs nr_runs=posint (optional) number of ML tree searches
bootstrap bootstrap={0,posint} (optional) number of bootstrap samples
rates rates=string (optional) Rates model
threaded threaded=integer (optional) number of threads to be used
eps eps=positive (optional) stop criteria for ML search
Returns:
TreeResult
Synopsis: RAxML is a tool to compute maximum likelihood trees from multiple
sequence alignments. For details see manual (Reference section).
Available substitution models:
GONNET matrices
subst=string GONNET, JTT, DAYHOFF, WAG, BLOSUM62, MTREV, RTREV, CPREV,
MTMAM, VT (all Protein) or GTR (DNA)
Available model modifiers:
inv_sites=boolean estimate proportion of invariant sites.
Default=false
estimate_basefreqs=boolean estimate the base frequences from the data,
otherwise use fixed frequencies from the model.
Default=false
rates=string choice of rates implementation. Available are
'CAT', 'GAMMA' and 'MIX'. 'CAT' classifies each site
into a fixed rate category. Likelihoods between
different topologies are not comparable and thus,
the method is only available in combination with
'nr_runs'=1. 'GAMMA' uses 4 discrete rate categories
according to a gamma distribution and estimates the
alpha parameter. The 'MIX' searches for a good
topology using the 'CAT' model and switches
afterwards to the 'GAMMA' model to compute stable
likelihoods. default='MIX'.
Parameters determine exhaustiveness of reconstruction:
nr_runs=posint determines the number of ML tree searches on the
original multiple sequence alignment. default=10
bootstrap={0,posint} determines the amount of bootstrap samples to be
evaluated. Default=0
start_tree={Tree,'MP','random'} specifies the start topology for the ML
search. 'MP' uses for each run a different maximum
parsimony tree and 'random' starts with a random
topology. Alternatively, you can pass a starting
topology. Default='MP'
eps=positive ML search will be stopped if the likelihood
increased by less than 'eps'. Default=0.1
Other parameters:
threaded=integer specifies the number of threads. If set to <= 1, the
sequential program is used. Default=1.
References: Alexandros Stamatakis. RAxML-VI-HPC: Maximum Likelihood-based
Phylogenetic Analyses with Thousands of Taxa and Mixed Models,
Bioinformatics 22(21):2688-2690, 2006 Source code and Manual: http://icwww.
epfl.ch/~stamatak/index-Dateien/Page443.htm
Examples:
> msa := Rand(MAlignment):;
> RAxML(msa, 'nr_runs'=2,'bootstrap'=100,'inv_sites'=true);
TreeResult(Tree(Tree(Leaf(RandSeq9,0.2961),0,Tree(Leaf(RandSeq6,0.2228),0.2127,
Tree(Tree(Leaf(RandSeq2,0.4692),0.4692,Leaf(RandSeq4,0.6664),73),0.4559,Tree(
Tree(Leaf(RandSeq8,1.3377),0.9953,Leaf(RandSeq5,1.0007),100),0.7236,Tree(Leaf(
RandSeq1,0.7442),0.7398,Tree(Leaf(RandSeq10,1.5340),1.0507,Leaf(RandSeq3,1.0612)
,100),95),100),100),100)),0,Leaf(RandSeq7,1.0473e-06)),ML,table([{[Likelihood, [
-3120.6126]]}, {}, {}, {[InvSites, [0.00011700]]}, {}, {[SubstModel, [GONNET]]},
{[CPUtime, [21.6900]]}, {}, {[Method, [RAxML]]}, {[Alpha, [1000.0997]]}, {}, {},
{}],unassigned))
See Also:
?LeastSquaresTree ?MAlign ?RobinsonFoulds ?TreeResult
?MafftMSA ?PhylogeneticTree ?Tree
RBFS_Tree
Function RBFS_Tree - apply heuristics to improve a distance tree
Calling Sequence: RBFS_Tree(t,Dist,Var)
Parameters:
Name Type Description
-------------------------------------------------------------------
t Tree input distance Tree
Dist matrix(numeric) distance matrix used to build Tree
Var matrix(numeric) variances of the distances
'Top' = posint (default=1) number of best trees to return
Returns:
set([numeric, Tree])
Synopsis: RBFS_Tree is a method for improving distance phylogenetic trees
using heuristics. The first type of heuristics, called Reduce Best Fitting
Subtree (RBFS) selects a set of subtrees which are highly consistent and
their fit is of good quality, replaces them with a single leaf and attempts
to optimize the reduced tree. The second heuristic chooses, from different
trees, subtrees which are on the same set of leaves and tries to graft them
together hoping that the resulting tree is better. RBFS_Tree returns a set
of pairs: [DimensionlessFit,Tree]. The number of trees returned can be
changed with the optional parameter Top=n. The trees returned are the ones
which have the highest quality (lowest DimensionlessFit value).
See Also:
?BootstrapTree ?Leaf ?SignedSynteny
?ComputeDimensionlessFit ?LeastSquaresTree ?Synteny
?GapTree ?PhylogeneticTree ?Tree
RGB_string
Function RGB_string - convert an RGB vector into a color name
Calling Sequence: RGB_string(rgb)
RGB_string(r,g,b)
Parameters:
Name Type Description
-----------------------------------------------------
rgb list(nonnegative) an RGB vector of length 3
r nonnegative intensity for red (0..1)
g nonnegative intensity for green (0..1)
b nonnegative intensity for blue (0..1)
Returns:
string
Synopsis: This function converts a 3 value RGB vector into a color name. The
vector contains the values for red, green and blue in a scale of 0 to 1.
Black is [0,0,0] and white is [1,1,1]. The matching is approximate and the
result is the one which is closest in euclidean distance to one in the
table. About 650 colours are known to this function. The full list can be
found at lib/Color.
Examples:
> RGB_string([0,0,0]);
black
> RGB_string(0.5,1,0);
chartreuse
> RGB_string(.8,.4,.1);
chocolate3
See also: ?Color ?DrawTree ?string_RGB
RSCU
Function RSCU - Relative synonymous codon usage
Calling Sequence: RSCU()
RSCU(dna)
Parameters:
Name Type Description
---------------------------------------------
dna string optional string of coding DNA
Returns:
list
Synopsis: The function RSCU returns the relative synonymous codon usage of a
organism if no argument is given. If a string of coding DNA is given the
relative synonymous codon usage for the string is returned. Relative
synonymous codon usage values are estimated as the ratio of the observed
codon usage to that value expected if there is uniform usage within
synonymous groups The RSCU for a codon (i) is RSCUi = Xi / Xj where Xi is
the number of times the ith codon has been used for a given amino acid, and
n is the number of synonymous codons for that amino acid
References: Sharp PM, Tuohy TMF, and Mosurski KR. Codon usage in yeast:
Cluster analysis clearly differentiates highly and lowly expressed genes.
Nucleic Acids Research 14:5125-5143
Rand
Function Rand
Options: builtin, numeric and polymorphic
Calling Sequence: Rand()
Returns:
numeric
Synopsis: This function returns a random number uniformly distributed between
0 and 1. The random number generator has the seed set by either the
function SetRand or SetRandSeed. Any class which is completed with the
command CompleteClass will have an automatically generated Rand function,
i.e. random objects of the class can be generated. The following table
describes the possible arguments of Rand and the object that will be
generated.
argument random structure
------------------------------------------------------------------------------
Alignment random alignment
array(t,d1,...) array of dimensions d1,... with entries of type t
Beta(a,b) Beta distributed number with average a/(a+b)
Binomial(n,p) integer binomially distributed, ave n*p, var n*p*(1-p)
Multinomial(n,ps) multinominally distributed integers
ChiSquare(nu) chi-square distributed number with ave nu, var 2*nu
CodingDNA(n) random DNA coding sequence (no stops) with n bases
DNA(n) random DNA sequence with n bases. Uses the global
vector AF, if suitable
Entry a random entry from the database in DB
Exponential(a,b) exponentially distributed number with ave a+b, var b^2
FDist(nu1,nu2) random F distributed or Variance-ratio number
GammaDist(p) random Gamma distributed number with ave p and var p
Geometric(p) geometrically distributed integer with ave (1-p)/p
Graph(n,m) random graph with n vertices and m edges
integer random integer
[t1,t2,...] a list with random components of the given types
LongInteger random extended precision integer
MAlignment random multiple sequence alignment
matrix(t) matrix with random dimensions and random entries of type t
Normal(a,b) normally distributed variable with ave a and var b
MNormal(a,b) multivariate normal with ave vector a and cov matrix b
Poisson(m) Poisson distributed integer with average and variance m
Polar complex number in Polar representation
posint random positive integer
Protein(n) a random sequence of amino acids of length n
Sequence the sequence of a random entry from the database in DB
a..b integers or numbers (depending on type of a,b) in the range
{a,b,...} a random value from the set
Stat results of univariate statistics
string random (readable) string
Student(nu) Student distributed variable with parameter nu
SvdResult results of an Svd least squares approximation
Tree random distance tree
a random object of this type
Examples:
> SetRand(5);
> Rand();
0.8649
> Rand();
0.6743
> Rand(Normal);
0.6467
> Rand(Binomial(20,0.2));
2
> Rand(Poisson(55));
43
> Rand(Geometric(0.2));
3
> Rand(Exponential(1.2,3));
3.8216
See Also:
?Beta_Rand ?Exponential_Rand ?Multinomial_Rand ?Shuffle
?Binomial_Rand ?FDist_Rand ?Normal_Rand ?StatTest
?ChiSquare_Rand ?GammaDist_Rand ?Poisson_Rand ?Std_Score
?CreateRandSeq ?Geometric_Rand ?SetRand ?Student_Rand
?Cumulative ?Graph_Rand ?SetRandSeed ?Zscore
Rank
Function Rank - Computes sample ranks
Calling Sequence: Rank(l)
Rank(l,p)
Parameters:
Name Type Description
--------------------------------------------------
l list a list of values
p {procedure} (optianal) ordering procedure
Returns:
list
Synopsis: This function returns the sample ranks of a list of values. Ties
(i.e. equal values) are replaced by the average rank of them.
Examples:
> Rank( [4,6,1,5,6,9,1,3,3] );
[5, 7.5000, 1.5000, 6, 7.5000, 9, 1.5000, 3.5000, 3.5000]
> Rank( [4,6,1,5,6,9,1,3,3], x->-x);
[5, 2.5000, 8.5000, 4, 2.5000, 1, 8.5000, 6.5000, 6.5000]
See also: ?avg ?cor ?sort ?std ?sum ?var
ReadBrk
Function ReadBrk
Calling Sequence: ReadBrk(fname)
ReadBrk(fname,tags = taglist)
Parameters:
Name Type Description
---------------------------------------------------------------
fname string file name with the Brookhaven database
taglist list(string) list of tags to be included
Returns:
NULL
Global Variables: chains
Synopsis: Read a Brookhaven database file into a Fold() data structure.
Specify "compressed=true" as an argument if the file should be read by
"zcat". The default taglist is HEADER, SOURCE, SEQRES, ATOM.
ReadData
Function ReadData - read a formatted file
Calling Sequence: ReadData(filename,fmt)
Parameters:
Name Type Description
-------------------------------------------
filename string name of file to be read
fmt string a valid sscanf format
Returns:
list(list)
Synopsis: ReadData opens and reads the file and scans each line with the
format given. The result of the scan is stored in a list which is returned.
Normally this will be a matrix (list of lists). If there are format errors,
a message is printed and the process continues up to 100 errors.
See Also:
?FileStat ?MySql ?ReadProgram ?ReadRawLine
?LockFile ?OpenReading ?ReadRawFile ?SearchDelim
ReadDb
Function ReadDb
Calling Sequence: ReadDb(fname)
Parameters:
Name Type Description
----------------------------------
fname string sequence database
Returns:
database
Global Variables: DB
Synopsis: The function loads the sequence database located in file. The
contents of file must be in the Darwin ISO-SGML format. By default, this
sequence database is assigned to the system variable DB unless another
variable is specified. This functions allows filename to specify a path.
If fname ends in ".gz" or ".Z", then it is assumed to be a compressed file
and it is decompressed before reading.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
See also: ?ConsistentGenome ?DB ?Entry ?GenomeSummary ?MySql
ReadDssp
Function ReadDssp
Calling Sequence: ReadDssp(fname)
Parameters:
Name Type
----------------
fname filename
Returns:
NULL
Global Variables: chains
Synopsis: Read a DSSP formatted database file into a Fold() data structure.
Specify "compressed=true" as an argument if the file should be read by
"zcat". Specify "tags=[taglist]" as an argument to read selected tags. The
default taglist is HEADER, SOURCE.
ReadFasta
Function ReadFasta - load fasta sequence file
Calling Sequence: ReadFasta(fn)
Parameters:
Name Type Description
---------------------------
fn string filename
Returns:
list(string) : list(string)
Synopsis: ReadFasta loads a file with fasta sequences and returns a list of
sequences and a list of ids.
See Also:
?FileStat ?OpenReading ?ReadLibrary ?ReadPima
?LockFile ?OpenWriting ?ReadLine ?ReadPir
?MySql ?ReadBrk ?ReadMap ?ReadProgram
?OpenAppending ?ReadDb ?ReadMsa ?ReadRawFile
?OpenPipe ?ReadDssp ?ReadOffsetLine
ReadLibrary
Function ReadLibrary
Option: builtin
Calling Sequence: ReadLibrary(filename)
ReadLibrary(filename,funcname)
Parameters:
Name Type Description
------------------------------------------------------
filename string procedure name or library filename
funcname symbol procedure name
Returns:
procedure
Synopsis: If only filename is supplied as a parameter, this function loads
the contents of filename located in the user's local Darwin library. The
ReadLibrary returns the function with the supplied name. If there is a
second parameter, the first one is used to load the file from the library
and the second should be a procedure name which is loaded in that file.
ReadLibrary returns the procedure named in the second argument. With two
arguments, if the filename starts with a slash ("/"), the it is assumed to
be an absolute path name and the library name (stored in libname) will not
be prepended to it. The location of the Darwin library is set with the -l
flag when initiating your Darwin session and is kept in the global variable
"libname".
One of the main uses of ReadLibrary is to provide a mechanism for automatic
loading of functions from the library. By assigning a name with an
unevaluated call to ReadLibrary (with the appropriate parameters), when the
function is used (and its name is evaluated), it will produce the actual
reading of the library. Since reading the library is likely to assign the
function name with a proc (or something else), the unevaluated ReadLibrary
will be obliterated and the reading of the library happens only once. This
mechanism allows efficient reading of library functions from many points;
the first read will be the only one executed. The file "darwinit" in the
Darwin library provides the definitions of all system-defined functions and
many different examples of its use.
Examples:
> ReadLibrary(MultiAlign);
> ReadLibrary(MultiAlign, AnchorAlign);
See also: ?libname ?ReadProgram ?ReadRawFile
ReadLine
Function ReadLine - reads a darwin command in a single line
Option: builtin
Calling Sequence: ReadLine()
ReadLine(t)
Parameters:
Name Type Description
--------------------------------
t string a prompt string
Returns:
anything
Synopsis: Reads one statement from the current input stream, evaluates the
statement and return its value. The string t is a prompt which is sent to
the standard output directly before reading from the standard input. This
statement should only be used from within a procedure.
Examples:
> x := proc()
t := ReadLine('prompt: ');
lprint('The user entered: ',t);
end;
> x();
prompt: 1+3;
The user entered: 4
See Also:
?FileStat ?OpenPipe ?ReadRawFile ?SplitLines
?inputoutput ?OpenReading ?ReadRawLine
?LockFile ?ReadData ?ReadURL
?MySql ?ReadOffsetLine ?SearchDelim
ReadOffsetLine
Function ReadOffsetLine - Reads one state from a file at a given offset
Option: builtin
Calling Sequence: ReadOffsetLine(filename,ofs)
Parameters:
Name Type Description
------------------------------------------------------
filename filename a filename from which to be read
ofs posint an offset into the file
Returns:
NULL
Synopsis: Reads one statement starting at ofs in file.
Examples:
See Also:
?FileStat ?OpenAppending ?ReadData ?ReadRawLine ?SplitLines
?inputoutput ?OpenReading ?ReadLine ?ReadURL
?LockFile ?OpenWriting ?ReadRawFile ?SearchDelim
ReadPhylip
Function ReadPhylip
Calling Sequence: ReadPhylip(fname)
Parameters:
Name Type
-------------------
fname a file name
Returns:
list
Synopsis: ReadPhylip opens the file indicated by fname, assumes that it is an
MSA in PHYLIP format and parses its content. The return value consists of a
list containing a list of sequences plus a list of corresponding labels.
Examples:
> ReadPhylip('myphylipfile.phy');
See Also:
?FileStat ?OpenReading ?ReadFasta ?ReadOffsetLine
?LockFile ?OpenWriting ?ReadLibrary ?ReadPima
?MySql ?ReadBrk ?ReadLine ?ReadPir
?OpenAppending ?ReadDb ?ReadMap ?ReadProgram
?OpenPipe ?ReadDssp ?ReadMsa ?ReadRawFile
ReadProgram
Function ReadProgram
Option: builtin
Calling Sequence: ReadProgram(fname)
Parameters:
Name Type
-------------------------------------
fname a string which is a file name
Returns:
NULL
Synopsis: ReadProgram opens the file indicated by fname. The file name
should be readable from the directory where Darwin is being executed. The
file is expected to contain valid Darwin statements. All statements in the
file are read and are only echoed if printlevel is sufficiently high. The
effect of the statements read is as if they were executed at the top level,
even when ReadProgram is called inside a function
Examples:
> ReadProgram(test);
See Also:
?FileStat ?OpenReading ?ReadFasta ?ReadOffsetLine
?LockFile ?OpenWriting ?ReadLibrary ?ReadPhylip
?MySql ?ReadBrk ?ReadLine ?ReadPima
?OpenAppending ?ReadDb ?ReadMap ?ReadPir
?OpenPipe ?ReadDssp ?ReadMsa ?ReadRawFile
ReadRawFile
Function ReadRawFile
Option: builtin
Calling Sequence: ReadRawFile(filename)
Parameters:
Name Type Description
--------------------------------------------------------------
filename string name of file to be read as a single string
Returns:
string
Synopsis: Read an entire file (returned as a single string) given by its
filename.
Examples:
See Also:
?FileStat ?OpenAppending ?ReadLine ?SearchDelim
?inputoutput ?OpenReading ?ReadOffsetLine ?SplitLines
?LockFile ?OpenWriting ?ReadRawLine
?MySql ?ReadData ?ReadURL
ReadRawLine
Function ReadRawLine - read a line as a string
Option: builtin
Calling Sequence: ReadRawLine()
ReadRawLine(t)
Parameters:
Name Type Description
--------------------------------
t string a prompt string
Returns:
string
Synopsis: Reads one line from the current input stream and returns it as a
string. When the input file is exhausted, the next ReadRawLines will return
the string EOF. The string t, if provided, is a prompt which is sent to the
standard output directly before reading from the standard input. This
statement should not be used in interactive mode or in the middle of a
program which is being read from the input stream, as there is bound to be
confusion between the program and the data. It is recommended to use it
inside a function/procedure.
Examples:
> OpenPipe(date);
> ReadRawLine();
Thu Oct 12 08:01:39 MET DST 2000
> ReadRawLine();
EOF
> x := proc()
t := ReadRawLine('prompt: ');
lprint('The user entered: ',t);
end;
> x();
prompt: 1+3;
The user entered: 1+3;
See Also:
?FileStat ?OpenReading ?ReadRawFile ?SplitLines
?inputoutput ?ReadData ?ReadURL
?LockFile ?ReadLine ?SearchDelim
?OpenPipe ?ReadOffsetLine ?ServerSocket
ReadTable
Function ReadTable( file:string )
Read utility similar to Splus read.table () function.
Optional arguments: Sep = string, Skip = integer, Format = string,
Prog = string and Format2 = [string].
Nota: Format bypass the sep=string mechanism. E.g.
ReadTable (somefile.gz, Skip = 1, Format = '%s%d%d', Prog = gzcat).
ReadTcp
Function ReadTcp( timeout:{0,posint} )
Waits up to timeout seconds to receive data from another machine
and execute it as Darwin commands. Returns (machine:string, pid:posint)
or NULL if not successful.
ReadURL
Function ReadURL
Calling Sequence: ReadURL(url)
Parameters:
Name Type Description
---------------------------
url string a URL
Returns:
string
Synopsis: Reads a URL and returns it as a string. Works the same way as
ReadRawFile, just with URLs instead of filenames.
See Also:
?DownloadURL ?OpenReading ?ReadLine ?SearchDelim
?OpenAppending ?OpenWriting ?ReadRawLine ?SplitLines
Readability
Function Readability - statistical index of readability
Calling Sequence: Readability(s,Counts,Ctype)
Parameters:
Name Type Description
-----------------------------------------------------------------------
s string input text to compute readability index
Counts array(integer,26,26) optional statistical frequencies
Ctype symbol optional name of frequencies
Returns:
numeric
Synopsis: Readability computes an index based on how well the text follows a
given set of probabilities of pairs of characters. The probabilities are
computed from a 26 x 26 matrix of counts of occurrences of pairs of letters.
Non letters are ignored (including spaces) and case is not sensitive. The
following names of frequencies are implemented:
Ctype Description
-----------------------------------------------------------
English di-graphs frequencies for Shakespeare
VowCon (default) vowel-consonant pairs only
VowConF vowel-consonant pairs including letter frequencies
Spanish di-graphs frequencies for Spanish
Examples:
> Readability('To be or not to be that is the question',English);
38.0937
> Readability('En un lugar de la Mancha de cuyo nombre no me acuerdo',English);
22.8186
> Readability(ASILITE,VowCon);
11.1734
See also: ?Mutate ?Rand ?Sequence
ReceiveDataTcp
Function ReceiveDataTcp( timeout:{0,posint} )
Waits up to timeout seconds to receive data from another machine.
Returns (machine:string, pid:posint, data:string) or NULL if not successful.
ReceiveTcp
Function ReceiveTcp
Option: builtin
Calling Sequence: ReceiveTcp(timeout)
Parameters:
Name Type Description
--------------------------------------------------
timeout {0,posint} seconds to wait for timeout
Returns:
{string}
Synopsis: Waits up to timeout seconds to receive data from the IPC daemon.
This command is usually preceded by a SendTcp. Returns NULL if no data is
received (i.e. timeout occurred).
Examples:
> r := traperror(ConnectTcp('/tmp/.ipc/darwin', false));
> SendTcp('PING'); r := ReceiveTcp(3);
r := PING OK
> SendTcp('MSTAT linneus1'); r := ReceiveTcp(3);
r := DATA linneus1 0:OK ALIVE
> DisconnectTcp();;
See Also:
?ConnectTcp ?ipcsend ?ParExecuteTest ?SendTcp
?darwinipc ?ParExecuteIPC ?ReceiveDataTcp
?DisconnectTcp ?ParExecuteSlave ?SendDataTcp
ReconcileTree
Function ReconcileTree - Reconciles a gene tree with a species tree
Calling Sequence: ReconcileTree(g,s,g2s)
ReconcileTree(g,s,g2s,reroot)
Parameters:
Name Type Description
----------------------------------------------------------------
g Tree Gene Tree
s {OVERLAP,Tree} Species Tree or Species Overlap method
g2s procedure mapping function from gene to species
reroot boolean (optional) reroot gene tree
Returns:
list
Synopsis: The function ReconcileTree infers gene duplication and speciation
events on a gene tree by comparing it to a TRUSTED species tree.
Alternatively, if no trusted species tree exists, one can use the species
overlap reconcilation method by setting passing 'OVERLAP' as the species
tree. The function g2s is a mapping function from the gene name to its
species.
If reroot is set to 'true' (by default it's false), the function reroots the
gene tree on every possible branch and reconciles all those trees. It
returns the rooted gene tree, that minimizes the number of dupliction
events.
The function returns the reconciled gene tree and the number of duplication
events on it. The events are stored in the 'XTRA' field of the tree: 'D=Y'
and 'D=N' indicate whether the node represents a duplication or speciation
event respectively.
References: Zmasek CM and Eddy SR. A simple algorithm to infer gene
duplication and speciation events on a gene tree. Bioinformatics, 2001,
17(9):821-828 van der Heijden RT et al, Orthology prediction at scalable
resolution by phylogenetic tree analysis. BMC Bioinformatics. 2007, 8:83.
Examples:
> GeneTree := Tree(Tree(Tree(Leaf(a_HUMAN, 3), 2, Leaf(a_YEAST,3)),
1, Leaf(b_BOVIN,3)), 0, Leaf(c_HUMAN,3));
GeneTree := Tree(Tree(Tree(Leaf(a_HUMAN,3),2,Leaf(a_YEAST,3)),1,Leaf(b_BOVIN,3)),0,Leaf(c_HUMAN,3))
> SpeciesTree := Tree(Tree(Leaf(HUMAN,2),1,Leaf(BOVIN,2)),0,Leaf(YEAST,2));
SpeciesTree := Tree(Tree(Leaf(HUMAN,2),1,Leaf(BOVIN,2)),0,Leaf(YEAST,2))
> SwissProtID := x -> x[SearchString('_',x)+2..-1];
SwissProtID := x -> x[SearchString(_,x)+2..-1]
> tree := ReconcileTree(GeneTree, SpeciesTree, SwissProtID);
tree := [Tree(Tree(Tree(Leaf(a_HUMAN,3),2,Leaf(a_YEAST,3),D=N),1,Leaf(b_BOVIN,3),D=Y),0,Leaf(c_HUMAN,3),D=Y), 2]
> tree := ReconcileTree(GeneTree, 'OVERLAP', SwissProtID);
tree := [Tree(Tree(Tree(Leaf(a_HUMAN,3),2,Leaf(a_YEAST,3),D=N),1,Leaf(b_BOVIN,3),D=N),0,Leaf(c_HUMAN,3),D=Y), 1]
See Also:
?BipartiteSquared ?LeastSquaresTree ?RobinsonFoulds ?Tree
?IntraDistance ?PhylogeneticTree ?RotateTree
RedoCompletion
Function RedoCompletion - Rewrite the file listing commands for shell
autocompletion
Calling Sequence: RedoCompletion()
Returns:
NULL
Synopsis: This function rewrites the file "cmds" in the library with the list
of all function defined in the current session (for this purpose, it uses
the function names())
See also: ?libname ?names
Region
Data structure Region( )
Structure to hold a gene region.
- Selectors:
Nr: set, Start: posint, End: posint, StartFrame: posint, EndFrame: posint,
FloatStart: boolean, FloatEnd: boolean, Sim: numeric, BestNr: posint,
MinShifts: integer, MaxShifts: integer
- Format:
Region(Nr,Start,End,StartFrame,EndFrame,FloatStart,FloatEnd,Sim,BestNr,
MinShifts,MaxShifts).
RegularGraph
Function RegularGraph - generate a random regular graph
Calling Sequence: RegularGraph(n,e)
Parameters:
Name Type Description
--------------------------------------------------
n integer optional number of nodes/vertices
e integer optional number of edges
Returns:
Graph
Synopsis: Generate a random graph where each of the n vertices has the same
degree e. The product n*e must be even.
Examples:
> RegularGraph(5,2);
Graph(Edges(Edge(0,1,2),Edge(0,1,4),Edge(0,2,5),Edge(0,3,4),Edge(0,3,5)),Nodes(1,2,3,4,5))
See Also:
?BipartiteGraph ?Graph_minus ?Nodes
?Clique ?Graph_Rand ?ParseDimacsGraph
?DrawGraph ?Graph_XGMML ?Path
?Edge ?InduceGraph ?ShortestPath
?EdgeComplement ?MaxCut ?TetrahedronGraph
?Edges ?MaxEdgeWeightClique ?VertexCover
?FindConnectedComponents ?MinCut
?Graph ?MST
RelativeAdaptiveness
Function RelativeAdaptiveness - Calculate the realative adaptiveness
Calling Sequence: RelativeAdaptiveness([e])
Returns:
list
Synopsis:
See also: ?ComputeCAI ?SetupRA
RellTree
Function RellTree - does RELL on a TreeResult
Calling Sequence: RellTree(TreeResult,nrOfBootstraps)
Parameters:
Name Type Description
----------------------------------------------------------------------
TreeResult TreeResult the TreeResult object in question
nrOfBootstraps posint (opt) desired number of bootstrap values
Synopsis: Applies RELL (resamplling of estimated log likelihood values) on a
TreeResult object that contains log likelihoods per site (e.g. from a phyml
run). See Kishino et al., MBE 1990, for more information.
See also: ?PhyML ?Tree
RenderTemplate
Function RenderTemplate - Substitutes placeholders in template file with user
variables
Calling Sequence: RenderTemplate(file,tab)
Parameters:
Name Type Description
------------------------------------
file string filename of template
tab table substitution table
Returns:
string
Synopsis: Return the content of the template file with substituted
placeholders. Three different placeholders are supported in the template
file: 1) '', where the whole tag gets replaced by the value
in the substitution table. 2) ' ... ', where ...
is ignored if the variable XXX is false and inserted if it is true
respectively. 3) '...' indicates a loop section,
where '' occurrences are replaced with the appropriate
values from tab[XXX,i,YYY], for all possible i. In this case, the value of
tab[XXX] needs to be a list of tables.
See Also:
?Block ?Document ?List ?screenwidth
?Code ?HTML ?Paragraph ?string
?Color ?HyperLink ?PostscriptFigure ?Table
?ConcatStrings ?Indent ?print ?trim
?Copyright ?LastUpdatedBy ?Roman ?TT
?DocEl ?latex ?RunDarwinSession ?View
ReplaceString
Function ReplaceString - Replace a phrase in a text
Calling Sequence: ReplaceString(old,new,txt)
Parameters:
Name Type Description
--------------------------------------
old string pattern to be replaced
new string new pattern
txt string text that will changed
Returns:
string
Synopsis: Replaces all occurrences of a string in a text with a new string.
Examples:
> ReplaceString('east', 'west', 'one flew east');
one flew west
See also: ?SearchAllString ?SearchDelim ?SearchString
Reverse
Function Reverse - Reverse a string or a list
Calling Sequence: Reverse(s)
Parameters:
Name Type Description
-----------------------------------------
s {list,string} any string or list
Returns:
{list,string}
Synopsis: Reverses a string or a list, i.e. the first character or element
becomes the last, the second the before-last, etc.
Examples:
> Reverse('ACTTACG');
GCATTCA
See also: ?antiparallel ?Complement ?CreateString ?string
RobinsonFoulds
Function RobinsonFoulds - Computes the pairwise Robinson-Foulds distance
between a set of trees
Calling Sequence: RobinsonFoulds(trees)
Parameters:
Name Type Description
----------------------------------
trees list(Tree) list of trees
Returns:
matrix(numeric)
Synopsis: The Robinson and Foulds (RF) distance between two trees is the
number of non-trivial bipartitions present in one of the two trees but not
the other, divided by the number of possible bi-partitions. Thus, the
smaller the RF distance between two trees the closer are their topologies.
The algorithm runs in O(m^2*n), where m ist the number of trees an n the
number of Leaves.
References: Pattengale, Gottlieb and Moret, "Efficiently Computing the
Robinson-Foulds Metric", J. Comp. Biol., 2007, 14(6), 724--735
Examples:
> t1 := Tree(Tree(Leaf(a,2),1,Leaf(b,2)),0,Tree(Leaf(c,2),1,Leaf(d,2))):
> t2 := Tree(Tree(Leaf(a,2),1,Leaf(d,2)),0,Tree(Leaf(c,2),1,Leaf(b,2))):
> RobinsonFoulds([t1,t2]);
[[0, 1], [1, 0]]
See also: ?BipartiteSquared ?IdenticalTrees ?IntraDistance ?Tree
Roman
Function Roman - convert an integer to a roman numeral
Calling Sequence: Roman(n)
Parameters:
Name Type
-------------
n posint
Returns:
string
Synopsis: Roman converts a positive integer into an uppercase roman numeral.
The conversion cannot be done for n<=0. For very large numbers, the output
string becomes linear in n/1000.
Examples:
> Roman(73);
LXXIII
> Roman(1948);
MCMXLVIII
> lowercase(Roman(14));
xiv
See Also:
?Block ?Document ?latex ?RunDarwinSession
?Code ?HTML ?List ?screenwidth
?Color ?HyperLink ?Paragraph ?Table
?Copyright ?Indent ?PostscriptFigure ?TT
?DocEl ?LastUpdatedBy ?print ?View
Romberg
Function Romberg - Integrates a function using Romberg's Schema
Calling Sequence: Romberg(f,a..b,eps,n)
Parameters:
Name Type Description
--------------------------------------------------------------------------------
f procedure function to integrate
a..b range (optional, default -inf..+inf) range of the integration
eps numeric (optional, default 1e-8) epsilon
n posint (optional, default 20) maximum dimension of Romberg's tableau
Returns:
numeric
Synopsis: Integrates the function numerically using Romberg's method. If the
range is not given, it integrates between -infinity and +infinity by using
the following substitution int(f(x),x=-inf..+inf) = int(f(tan(x))*(1+tan
(x)^2),x=-Pi/2..Pi).
Examples:
> Romberg(x -> sin(x), 0..2*Pi);
2.5649e-16
RotateTree
Function RotateTree - Returns a new, rotated tree
Calling Sequence: RotateTree(tree,side,sub_side)
Parameters:
Name Type Description
----------------------------------------------------------
tree Tree a tree to be rotated
side {Left,Right} the first indication about side
sub_side {Left,Right} the second indication about side
Returns:
Tree
Synopsis: Returns a new, rotated tree rooted half-way through the edge that
is indicated by the side and sub-side arguments. The leaves of the tree
should have annotated heights, but this is not strictly enforced, unless the
rotation is happening directly next to a leaf.
Examples:
> t := Tree(Tree(Leaf(A,15),5,Leaf(B,15)),0,Tree(Leaf(C,15),11,Leaf(D,15)));
t := Tree(Tree(Leaf(A,15),5,Leaf(B,15)),0,Tree(Leaf(C,15),11,Leaf(D,15)))
> newt := RotateTree(t,Left,Left);
newt := Tree(Leaf(A,5),0,Tree(Leaf(B,15),5,Tree(Leaf(C,25),21,Leaf(D,25)),100))
See also: ?AllRootedTrees ?AllTernaryRoots ?Tree
RunDarwinSession
Function RunDarwinSession - run Darwin code inside a Document and insert
results
Calling Sequence: RunDarwinSession(doc)
Parameters:
Name Type Description
------------------------------------------------------
doc structure typically a document or part of one
Returns:
structure
Synopsis: RunDarwinSession scans the input Document structure (or part of
one) and collects all the structures of type DarwinCode(string),
DarwinHideInput(string), DarwinExpression(string), DarwinHidden(string) and
DarwinCodeHTML(string). These have the following effects:
DarwinCode(string) - The string contents of this structure are interpreted as
statements to a darwin session and are collected and executed
by darwin. The output is separated into its component, and
each original DarwinCode structure is replaced by a green Code
structure containing the input and a red Code structure
containing the output.
DarwinHideInput(string) - The string contents of this structure are
interpreted as statements to a darwin session and are collected
and executed by darwin. The output is separated into its
component, and each original DarwinHideInput structure is
replaced by a red Code structure containing the output.
DarwinExpression(string) - The contents of this structure are considered to be
statements also to be merged in the darwin code and executed.
Their values will replace the structure in the Document.
DarwinExpressions serve as a mechanism to incorporate values
computed in the darwin run, which may not be known, into the
text of the Document.
DarwinHidden(string) - The contents of this structure is executed, but no
result is incorporated in the document. This is useful to set
parameters appropriately while it is unwanted to reflect this
in the resulting document. E.g. Set(gc=xxx).
DarwinCodeHTML(string) - The contents of this structure are assumed to contain
characters which are invalid in normal HTML, (like "<" and ">")
and these characters are converted to their corresponding
Entity Names. Typical uses of this structure are programs
which have the special symbols or programs which will output
HTML tags. E.g. "if a < b then ..." or printf( '%s'
).
InvokeDarwin - This is a global variable which may be assigned with the name
of a command to execute Darwin if the default ("darwin") is not
suitable. This is needed when the Darwin command is special or
it must be executed with special arguments.
DarwinOutputUpperLimit - This is a global variable which if assigned with a
positive integer (call it n) will limit the number of output
lines of each single DarwinCode() set of statements. The value
n is the number of lines to be displayed, and if the output has
more lines than n lines, the top n/2 lines will be displayed
followed by a line ". . . . (xxx output lines skipped) . . . ."
followed by the last n/2 lines. This is very useful when the
output is undesirably long, but necessary.
DarwinTimeout - This is a global variable which if assigned a positive value
will limit the execution time of the Darwin session to that
value (in seconds). By default the session is allowed to run
for 600 seconds.
RunDarwinSession is relatively robust against errors, help files, etc. It
cannot display objects which are shown with the command View.
See Also:
?Block ?Document ?latex ?Roman
?Code ?HTML ?List ?screenwidth
?Color ?HyperLink ?Paragraph ?Table
?Copyright ?Indent ?PostscriptFigure ?TT
?DocEl ?LastUpdatedBy ?print ?View
SPCommonName
Function SPCommonName - common name of the species of the entry or scientific
name
Calling Sequence: SPCommonName(entry)
Parameters:
Name Type Description
--------------------------------------------------------------
entry anything any description of an entry or entry number
Returns:
string
Synopsis: SPCommonName finds the common name of the species of the given
entry. If the input is the scientific name of a species, SPCommonName will
try to locate an entry with that name to use it. The common name is found
within parenthesis in the OS entry in SwissProt databases. If the database
in DB does not conform to this rule, the function may not work properly. If
no common name is found, it returns the species name. If no species name is
found, it returns the AC or ID or "no name". This function is useful to
provide simple labels for plots.
Examples:
> SPCommonName(AC(P13475));
Slime mold
> SPCommonName('Raphicerus campestris');
Steenbok
> SPCommonName(AC(P00083));
Rhodopseudomonas viridis
See Also:
?DbToDarwin ?SearchAC ?Species_Entry
?GetEntryInfo ?SearchID ?SP_Species
SP_Species
Function SP_Species - find all the names of species in the database
Calling Sequence: SP_Species(taxon)
Parameters:
Name Type Description
--------------------------------------------------
taxon string optional taxonomic classification
Returns:
set(string)
Synopsis: SP_Species scans the database assigned to DB and returns the names
of all the species (or all the species of the given taxonomic
classification). This assumes that the database is a SwissProt database or
that at least it has the OC tags with taxonomic information. The matching
of the taxonomic information is done in a textual and case insensitive mode.
If this results in an ambiguous selection, it is possible to include a
longer portion of the taxonomic information (see examples).
Examples:
> SP_Species(Abies);
{Abies alba,Abies bracteata,Abies firma,Abies grandis,Abies holophylla,Abies homolepis,Abies magnifica,Abies mariesii,Abies sachalinensis,Abies veitchii}
> SP_Species(Pinus);
{Carpinus betulus,Carpinus caroliana,Lupinus albescens, Lupinus aureonitens,Lupinus albifrons,Lupinus albus,Lupinus angustifolius,Lupinus arboreus,Lupinus atlanticus, Lupinus digitatus, Lupinus pilosus,Lupinus cosentinii,Lupinus densiflorus,Lupinus luteus,Lupinus microcarpus,Lupinus nanus,Lupinus polyphyllus,Pinus balfouriana,Pinus banksiana,Pinus contorta,Pinus edulis,Pinus griffithii,Pinus koraiensis,Pinus krempfii,Pinus longaeva,Pinus monticola,Pinus pinaster,Pinus pinea,Pinus radiata,Pinus strobus,Pinus sylvestris,Pinus taeda,Pinus thunbergii,Pinus virginiana}
> SP_Species('Pinaceae; Pinus');
{Pinus balfouriana,Pinus banksiana,Pinus contorta,Pinus edulis,Pinus griffithii,Pinus koraiensis,Pinus krempfii,Pinus longaeva,Pinus monticola,Pinus pinaster,Pinus pinea,Pinus radiata,Pinus strobus,Pinus sylvestris,Pinus taeda,Pinus thunbergii,Pinus virginiana}
See Also:
?DbToDarwin ?SearchAC ?SPCommonName
?GetEntryInfo ?SearchID ?Species_Entry
SaveEntries
Function SaveEntries( xs, descr:string )
Save all sequences from entries xs to files descr.
ScaleIndex
Function ScaleIndex - Compute the Scale Variation Index
Calling Sequence: ScaleIndex(ma)
Parameters:
Name Type Description
--------------------------------------------------
ma array(string) multiple sequence alignment
t Tree a phylogenetic tree
Returns:
list(numeric)
Global Variables: ScaleIndex_MA ScaleIndex_Tree
Synopsis: Computes a variation index defined as the scale factor for pam
distances that makes Probability{position} maximal for all positions of a
multiple alignment.
Examples:
> ma := [ ' -------------------------FPEVVGKTVDQA ..(535).. CSPRKGTKT'];
ma := [ -------------------------FPEVVGKTVDQAREYFTLHYPQ , -------------------IASAGFVRDAQGNCIK--- , AKQVVLLIFGSWQLARERLANEMRKAVAY__TFL__NFDMGRQPLSMHYSDKVCSPRMSTET, AEPIVPLLFGMWRLKRKKANNKLLRCVKY__TLLARNTSDGREPVACRYSEKICSPRTGTKT, AEVIVPLLFGVWRLKREERTYTLLQCVKY__VFLARNTVAGNRPLSKKFSEKVCSPRK , AEPIVPLLFGLWQLAREKASNTLLQCVKY__VFLARNTVAGRRPLKMKYSDKVCSPRKGAKT, EPIVPLL__MWQLAIEKSSNTLLQCVK__KVFLARKTVAGRRPLSMKFSDKVCNPRKGTKT, PIVPLLFGMWQLAREKASNTLLQCVKYYYVFLARNTVAGRRPLSMKYSDKVCSPRKGTKT]
> tree := Tree(Tree(Leaf(b,-250.0000,2),-2.8422e-14, ..(271).. 00,3)))))));
tree := Tree(Tree(Leaf(Permutation([5, 6, 1, 2, 4, 7, 3]),-250,2),-2.8422e-14,Leaf(Permutation([4, 5, 6, 2, 3, 1, 7]),-250,1)),0,Tree(Leaf(h,-250,8),-209.7583,Tree(Leaf(g,-260.8121,7),-227.6537,Tree(Leaf(f,-256.9830,6),-233.8701,Tree(Leaf(d,-240.9182,4),-235.7326,Tree(Leaf(e,-252.2867,5),-237.4908,Leaf(c,-239,3)))))))
> scxd := ScaleIndex (ma, tree);
scxd := [-2.9973, -0.1326, 0.5459, -0.2116, -2.9973, 0.2001, -2.9973, -0.03674823, -2.9973, -2.9973, 0.4785, -2.9973, 0.1069, -2.9973, 0.3251, -0.03946685, -0.1529, 0.1645, 0.6408, 0.3623, -0.1573, 0.2861, -0.07401567, -0.07673429, 0.1184, 0.06689822, -0.7055, -0.02355176, -0.6150, -0.1929, 2.6355, -0.1831, 0.06961684, -0.2861, -0.6337, -0.4572, -0.07401567, -0.2176, 0.4944, 0.3738, -0.4369, -0.6825, 0.2692, -0.09976684, -0.1157, 0.03234940, -0.00219941, 0.2089, -0.2363, -0.6194, 0.1672, -2.9973, -0.1831, -2.9973, -0.2746, -2.9973, -2.9973, 0.2878, 0.2461, -0.1414, -0.05978077, -2.9973]
See also: ?KWIndex ?PlotIndex ?PrintIndex ?ProbIndex
ScaleTree
Function ScaleTree - Scales a tree to a specific height.
Calling Sequence: ScaleTree(t,h)
Parameters:
Name Type Description
--------------------------------------------------
t Tree tree to be scaled
h positive new distance from root to leaves
Returns:
Tree
Synopsis: The function ScaleTree scales a tree to a given height. If the tree
is not ultrametric, the distance from the root to the deepest leaf is
scaled. The root of the returned tree is always at height/time 0.
Examples:
> BDTree := BirthDeathTree(0.1, 0.01, 10, 50);
BDTree := Tree(Tree(Leaf(S1,50),35.8040,Leaf(S2,50)),17.1528,Tree(Tree(Tree(Leaf(S3,50),32.0575,Tree(Tree(Leaf(S4,50),44.9293,Leaf(S5,50)),37.4626,Leaf(S6,50))),28.9295,Tree(Tree(Leaf(S7,50),47.4169,Leaf(S8,50)),45.1129,Leaf(S9,50))),21.3121,Leaf(S10,50)))
> ScaledTree := ScaleTree(BDTree, 100);
ScaledTree := Tree(Tree(Leaf(S1,100.0000),56.7819,Leaf(S2,100.0000)),0,Tree(Tree(Tree(Leaf(S3,100.0000),45.3760,Tree(Tree(Leaf(S4,100.0000),84.5627,Leaf(S5,100.0000)),61.8313,Leaf(S6,100.0000))),35.8531,Tree(Tree(Leaf(S7,100.0000),92.1360,Leaf(S8,100.0000)),85.1217,Leaf(S9,100.0000))),12.6625,Leaf(S10,100.0000)))
See also: ?AddDeviation ?BirthDeathTree ?Tree
ScoreAlignment
Function ScoreAlignment - scores an existing codon or protein alignment
Calling Sequence: ScoreAlignment(dps1,dps2,S)
Parameters:
Name Type Description
-----------------------------------------------------------------
dps1 string First of the aligned sequences.
dps2 string Second of the aligned sequences.
S {CodonMatrix,DayMatrix} a scoring matrix
Returns:
numeric
Synopsis: This functions scores two aligned sequences with a given scoring
matrix S. If S is a CodonPAM or SynPAM matrix, the sequences are
interpreted as DNA, and if S is a Dayhoff matrix, the sequences are assumed
to be proteins. The two input strings must be of same length and can include
gaps ('___' or '_') which will be scored according to the gap cost formula
as defined in in the scoring matrix.
Examples:
> ScoreAlignment(AAACCCGGGTTT,AAACCG___TTT,cm);
13.7069
See Also:
?Align ?CodonMatrix ?CreateSynMatrices
?CodonAlign ?CreateCodonMatrices ?DynProgStrings
?CodonDynProgStrings ?CreateDayMatrices
ScoreIntron
Function ScoreIntron
Calling Sequence: ScoreIntron(m,intron)
Parameters:
Name Type
--------------------
m NucPepMatch
intron posint
Returns:
NULL
Synopsis: Computes the score [alpha, delta, omega] for a given intron.
Examples:
See also: ?Introns
SearchAC
Function SearchAC - find an entry with a given accession number
Calling Sequence: SearchAC(pat)
Parameters:
Name Type
-------------
pat string
Returns:
Entry
Synopsis: The SearchAC function searches the sequence database currently
assigned to system variable DB. It returns an entry data structure which
contains at most one exact match of the given argument, pat, with the AC
field of the entry. If no match can be found it returns NULL.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
> SearchAC('Q62671;');
EDD_RATQ62671;Ubiquitin-- ..(1568).. V
> SearchAC(ZZZZ);
See also: ?DB ?SearchID ?SearchSeqDb ?SearchTag
SearchAllArray
Function SearchAllArray
Calling Sequence: SearchAllArray(t,A)
Parameters:
Name Type
---------------
t anything
A array
Returns:
array
Synopsis: The function SearchAllArray returns the array of indices of an
element in an array if it is a member of the array. Otherwise it returns an
empty list.
Examples:
> SearchAllArray(5, [1, 2, 7, 5, 8, 5, 7, 5]);
[4, 6, 8]
> SearchAllArray('hi', ['hello', 'hallo', 'hey', 'hoi']);
[]
See also: ?SearchArray ?SearchOrderedArray ?table
SearchAllString
Function SearchAllString - Find several instances of phrase in a text
Calling Sequence: SearchAllString(pat,txt)
Parameters:
Name Type Description
----------------------------------------
pat string a pattern that is sought
txt string a text which is searched
Returns:
list
Synopsis: The function SearchAllArray returns the array of indices of an all
the occurrences of the pattern in the text. If pattern can not be found it
returns an empty list. This function is case insensitive.
Examples:
> SearchAllString('hehe', 'hehehe');
[1, 3]
> SearchAllString('cat', 'acgcagcatgcatcagtca');
[7, 11]
See Also:
?BestSearchString ?MatchRegex ?SearchMultipleString
?CaseSearchString ?SearchApproxString ?SearchString
?HammingSearchString ?SearchDelim
SearchArray
Function SearchArray
Option: builtin
Calling Sequence: SearchArray(t,A)
Parameters:
Name Type
-----------------------
t {numeric,string}
A array
Returns:
{0,posint}
Synopsis: The function SearchArray returns the index of an element in an
array if it is a member of the array. Otherwise it returns 0.
Examples:
> SearchArray(5, [1, 2, 7, 5, 8]);
4
> SearchArray('hi', ['hello', 'hallo', 'hey', 'hoi']);
0
See also: ?SearchAllArray ?SearchOrderedArray ?table
SearchDayMatrix
Function SearchDayMatrix - search an array of DayMatrix for a given PAM
Option: builtin
Calling Sequence: SearchDayMatrix(PAM,daymat)
Parameters:
Name Type Description
--------------------------------------------------------------------
PAM numeric PAM distance for which matrix is sought
daymat array(DayMatrix) an array of Dayhoff matrices
Returns:
DayMatrix
Synopsis: This function searches the list of DayMatrix for the Dayhoff matrix
calculated with PamNumber closest to PAM. This function assumes that daymat
is in ascending order.
Examples:
> CreateDayMatrices();
> SearchDayMatrix(250, DMS);
DayMatrix(Peptide, pam=250, Sim: max=14.152, min=-5.161, del=-19.814-1.396*(k-1))
See Also:
?CreateDayMatrices ?CreateDayMatrix ?CreateOrigDayMatrix ?DayMatrix
SearchDb
Function SearchDb
Calling Sequence: SearchDb(pat_1..pat_k)
Parameters:
Name Type
----------------------------
pat_i {string,set(string)}
Returns:
an Entry structure
Synopsis: The SearchDb function searches the sequence database currently
assigned to system variable DB. When pat_i consists of a set of strings,
the function returns the logical OR of the results (all entries containing
at least one of the elements in the set pat_i). The comma symbol represents
the logical AND of the arguments. In this case, SearchDb returns only
those entries that contain all such patterns.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
> SearchDb('platypus');
AMEL_ORNANO97646;Amelogen ..(595).. T, ATP6_ORNANQ36454;ATP synt ..(835).. T, ATP8_ORNANQ36453;ATP synt ..(569).. S, COX1_ORNANQ36452;Cytochro ..(1126).. A, COX2_ORNANQ37718;Cytochro ..(996).. S, COX3_ORNANQ36455;Cytochro ..(829).. S, CYB_ORNANQ36461;Cytochrom ..(968).. W, DLP1_ORNANP82172;Defensin ..(530).. 2, DLP2_ORNANP82140;Defensin ..(538).. 2, DLP3_ORNANP82141;Defensin ..(413).. F, HBA_ORNANP01979;Hemoglobi ..(754).. R, HBB_ORNANP02111;Hemoglobi ..(725).. H, HSP1_ORNANP35307;Sperm pr ..(603).. N, INS_ORNANQ9TQY7; Q9TQY8;I ..(614).. N, LCA_ORNANP30805;Alpha-lac ..(847).. C, MYG_ORNANP02196;Myoglobin ..(724).. G, NU1M_ORNANQ37717;NADH-ubi ..(878).. M, NU2M_ORNANQ36451;NADH-ubi ..(979).. S, NU3M_ORNANQ36456;NADH-ubi ..(593).. E, NU4M_ORNANQ36458;NADH-ubi ..(1137).. C, NU5M_ORNANQ36459;NADH-ubi ..(1343).. F, NU6M_ORNANQ36460;NADH-ubi ..(644).. H, NULM_ORNANQ36457;NADH-ubi ..(674).. C
> SearchDb('alpha-lactalbumin');
LCAA_HORSEP08334;Alpha-la ..(794).. L, LCAB_HORSEP08896;Alpha-la ..(818).. L, LCA_BOSMUQ9TSR4;Alpha-lac ..(863).. L, LCA_BOVINP00711; Q95NE4;A ..(1467).. 3, LCA_BUBBUQ9TSN6;Alpha-lac ..(882).. L, LCA_CAMDRP00710;Alpha-lac ..(851).. W, LCA_CANFAQ9N2G9;Alpha-lac ..(825).. L, LCA_CAPHIP00712;Alpha-lac ..(1215).. 2, LCA_CAVPOP00713;Alpha-lac ..(1110).. 9, LCA_EQUASP28546;Alpha-lac ..(812).. L, LCA_FELCAP37154;Alpha-lac ..(562).. P, LCA_HUMANP00709;Alpha-lac ..(1557).. 5, LCA_MACEUQ06655;Alpha-lac ..(1002).. C, LCA_MACGIP19122;Alpha-lac ..(664).. V, LCA_MACRGP07458;Alpha-lac ..(839).. C, LCA_MOUSEP29752;Alpha-lac ..(1384).. 0, LCA_ORNANP30805;Alpha-lac ..(847).. C, LCA_PAPCYP12065;Alpha-lac ..(998).. 7, LCA_PIGP18137;Alpha-lacta ..(859).. M, LCA_RABITP00716; Q9TQT7;A ..(907).. K, LCA_RATP00714; P00715;Alp ..(965).. P, LCA_SHEEPP09462; Q9GKS5;A ..(942).. L, LCA_TACACP81646;Alpha-lac ..(828).. C, LCA_TRIVUQ29145;Alpha-lac ..(889).. C
> SearchDb('platypus', 'alpha-lactalbumin');
LCA_ORNANP30805;Alpha-lac ..(847).. C
> SearchDb('alpha-lactalbumin', {'platypus', 'panda'});
LCA_ORNANP30805;Alpha-lac ..(847).. C
See Also:
?DB ?SearchAC ?SearchSeqDb ?Species_Entry
?PatEntry ?SearchID ?SearchTag
SearchDelim
Function SearchDelim - break up a string at each occurrence of a delimiter
Calling Sequence: SearchDelim(delim,txt)
Parameters:
Name Type Description
-------------------------------------------------------------
delim string a pattern that delimits portions of a string
txt string the text to be split
Returns:
list(string)
Synopsis: SearchDelim returns a list of strings, where each string in the
list is one of the parts of the txt delimited by occurrences of delim.
SearchDelim is ideal to break up a string which contains many lines
separated by newlines. If the string after the last occurrence of delim is
empty, it is not added to the list. Delimiting with an empty string does
not make sense and it is not allowed.
Examples:
> SearchDelim('a', 'abracadabra');
[, br, c, d, br]
> SearchDelim('\n', 'file1\nfile2\nfile3\n');
[file1, file2, file3]
See Also:
?BestSearchString ?Lines ?SearchMultipleString
?CaseSearchString ?MatchRegex ?SearchString
?HammingSearchString ?SearchApproxString ?SplitLines
SearchFrag
Function SearchFrag - Search database for a fragment
Calling Sequence: SearchFrag(seq)
Parameters:
Name Type
-------------
seq string
Returns:
list(Match)
Synopsis: Return all matches of seq against the peptide database located in
the system variable DB.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
> SearchFrag('SGPPRIP');
Searching the fragment SGPPRIP in /home/darwin/DB/SwissProt.Z, Tue Feb 19 10:54:39 2013
With goal 31.8 and PAM 250, 185 matches were found
After refining with Align/DMS, 4 matches were selected
with similarity not less than 70
See Also:
?AlignOneAll ?SearchAC ?SearchID ?Species_Entry
?PatEntry ?SearchDb ?SearchSeqDb
SearchID
Function SearchID
Calling Sequence: SearchID(pat)
Parameters:
Name Type
-------------
pat string
Global Variables: SearchID_DBname SearchID_table
Synopsis: The SearchID function searches the sequence database currently
assigned to system variable DB. It returns an entry data structure which
contains at most one exact match of the given argument, pat, with the ID
field of the entry. If no match can be found it returns an empty data
structure.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
> SearchID(CY2_RHOVI);
CY2_RHOVIP00083;Cytochrom ..(1021).. 6
> SearchID(ZZZZ);
See also: ?DB ?SearchAC ?SearchSeqDb ?SearchTag ?Species_Entry
SearchMassDb
Function SearchMassDb - Searches digestion fragments against a database
Option: builtin
Calling Sequence: SearchMassDb(p,n)
Parameters:
Name Type Description
----------------------------------------------------------------
p Protein description of protein (weights, enzymes, etc.)
n integer maximum number of returned matches
Returns:
MassProfileResults
Synopsis: Searches the n most significant matches of weights of digested
fragments. The search is done against the database which is currently loaded
(with the command ReadDb). This could be a protein or a nucleotide
database. The description of the protein to be searched is in terms of the
(one or many) weights resulting from digesting the protein with an enzyme.
This description can also hold other information as deuteration, and
modified amino acid weights. See Protein and DigestionWeights for details.
The result is a data structure which contains the best n matches, ordered
from best to worst. Each match is described by the similarity score, number
of fragments in the protein, number of matched fragments, and description of
the matching protein. See MassProfileResults for full details.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
> print( SearchMassDb( Protein(DigestionWeights('Trypsin',
601.9438, 504.0904, 1512.4545, 480, 590)), 5 ));
Score n k AC DE OS
60.4 21 4 P28519; DNA repair protein RAD14. Saccharomyces cerevisiae
(Baker's yeast). Unmatched weights: [1512.5].
60.0 7 3 Q43284; Oleosin 14.9 kDa. Arabidopsis thaliana (Mouse-ear cress).
Unmatched weights: [480.0, 1512.5].
59.8 17 4 P21908; Glucokinase (EC 2.7.1.2) (Glucose kinase). Zymomonas
mobilis. Unmatched weights: [590.0].
58.2 6 3 Q9FC39; Protein crcB homolog 1. Streptomyces coelicolor.
Unmatched weights: [590.0, 1512.5].
57.3 11 3 P06931; E6 protein. Bovine papillomavirus type 1. Unmatched
weights: [590.0, 601.9].
See Also:
?DigestAspN ?DigestWeights ?MassProfileResults
?DigestionWeights ?DynProgMass ?ProbBallsBoxes
?DigestSeq ?DynProgMassDb ?ProbCloseMatches
?DigestTrypsin ?enzymes
SearchMultipleString
Function SearchMultipleString - search several sequential patterns in a string
Calling Sequence: SearchMultipleString(pat1,pat2,...,text)
Parameters:
Name Type
--------------
pat_i string
txt string
Returns:
list(integer)
Synopsis: The SearchMultipleString function returns a list with the offsets
of all the matches of each of the patterns given as arguments. This is very
useful when one wants to search for a portion of a string enclosed in some
particular context. The individual patterns are matched as case
insensitive. All the patterns have to match, in a non-overlapping way and
in the given order. If there is no match of all the patterns, the function
returns an empty list.
Examples:
> SearchMultipleString( '(', 'a', ')', '(),(bbb), (...a...)' );
[0, 14, 18]
See Also:
?BestSearchString ?MatchRegex ?SearchString
?CaseSearchString ?SearchApproxString
?HammingSearchString ?SearchDelim
SearchOrderedArray
Function SearchOrderedArray
Option: builtin
Calling Sequence: SearchOrderedArray(target,L)
Parameters:
Name Type Description
-----------------------------------------------------------
target {numeric,string} target to be searched for
L {array,list} array or list to be searched in
Returns:
{0,posint}
Synopsis: The SearchOrderedArray function returns the first index i such that
L[i] <= target < L[i+1].
Examples:
> SearchOrderedArray(5, [2, 4, 6, 8, 10]);
2
> SearchOrderedArray('mike', ['chantal', 'gaston', 'mike', 'ulrike', 'xianghong']);
3
> SearchOrderedArray(5, [10, 8, 6, 4, 2]);
0
See also: ?SearchAllArray ?SearchArray ?table
SearchSeqDb
Function SearchSeqDb
Option: builtin
Calling Sequence: SearchSeqDb(txt)
Parameters:
Name Type Description
---------------------------------------------------------------
txt {string,string..string} sequence string to be searched
Returns:
PatEntry
Synopsis: Find all the occurrences of t in the amino acid sequences part of
DB.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
> SearchSeqDb('SGPPRIP');
PatEntry(46915583..46915583)
See also: ?AlignOneAll ?SearchFrag
SearchString
Function SearchString - case insensitive exact string searching
Option: builtin
Calling Sequence: SearchString(pat,txt)
Parameters:
Name Type Description
----------------------------------------
pat string a pattern that is sought
txt string a text which is searched
Returns:
{-1,0,posint}
Synopsis: This returns the offset before the character where pat matches with
txt. If pat does not match txt, -1 is returned. This function is case
insensitive.
Examples:
> SearchString('HerE', 'It is in hERe');
9
> SearchString('where', 'wear am i');
-1
See Also:
?BestSearchString ?MatchRegex ?SearchMultipleString
?CaseSearchString ?SearchApproxString
?HammingSearchString ?SearchDelim
SearchTag
Function SearchTag
Option: builtin
Calling Sequence: SearchTag(tg,txt)
Parameters:
Name Type Description
------------------------------------------------------------------------
tg string an SGML tag without the surrounding angle brackets
txt string a string that is searched for in the field defined by tg
Returns:
string
Synopsis: The SearchTag function extracts the information surrounded by SGML
tag tg in the body of txt text. If tg is not found in txt, the empty
string is returned.
Examples:
> SearchTag('AC', 'ABL1_CAEELP03949;');
P03949;
See also: ?SearchAC ?SearchDb ?SearchID ?Species_Entry
SendDataTcp
Function SendDataTcp( machine:string, pid:posint, data:string )
Sends data to pid on machine.
SendTcp
Function SendTcp
Option: builtin
Calling Sequence: SendTcp(data)
Parameters:
Name Type Description
----------------------------------------
data string command to the ipcdeamon
Returns:
NULL
Synopsis: SendTcp sends data to the IPC daemon. This data is usually a
command understood by darwinipc. See ?darwinipc. A SendTcp is followed by a
ReceiveTcp to read out the response from the daemon.
Examples:
> r := traperror(ConnectTcp('/tmp/.ipc/darwin', false));
> SendTcp('PING'); r := ReceiveTcp(3);
r := PING OK
> SendTcp('MSTAT linneus1'); r := ReceiveTcp(3);
r := DATA linneus1 0:OK ALIVE
> DisconnectTcp();;
See Also:
?ConnectTcp ?ipcsend ?ParExecuteTest ?SendDataTcp
?darwinipc ?ParExecuteIPC ?ReceiveDataTcp
?DisconnectTcp ?ParExecuteSlave ?ReceiveTcp
Sequence
Function Sequence - Searching and retrieving sequences in the database DB
Option: polymorphic
Calling Sequence: Sequence(off)
Parameters:
Name Type Description
--------------------------------------------------------------------------------------
off {integer,list,string,structure} entries or list of entries in the database DB
Data structure of type PatEntry, AC or ID
Returns:
Sequence
Synopsis: Sequence will return the peptide or nucleotide sequence pointed by
the argument(s). This normally consists of the field enclosed by the tags
and . Sequence returns a string or an expression sequence of
strings. When the argument is an ID or an AC structure, the database is
searched for the corresponding ID or AC. If the argument is an integer, it
is taken to be a database offset into a sequence. In this case the maximal
sequence starting at that offset is returned. Otherwise, the arguments are
treated as the arguments for Entry, and their sequences extracted.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
> s1 := Sequence(Entry(1));
s1 := MKFLILLFNILCLFPVLAADNHGVGPQGASGVDPITFDINSNQTGPAFLT ..(924).. ILVVSLIVGIL
> Sequence(PatEntry(10000..10001));
A, A
> Sequence(AC('P11341'));
MAYRGFKTSRVVKHRVRRRWFNHRRRYR
> Sequence(ID('ID5B_PROJU'));
SDRCKDLGISIDEENNRRLVVKDGDPLAVRFVKANRRG
> GetEntryNumber(s1);
1
See Also:
?AC ?ID ?PatEntry ?SearchID
?Entry ?Match ?SearchAC ?Species_Entry
ServerSocket
Function ServerSocket - Listen from unix domain socket, fork to process
requests
Calling Sequence: ServerSocket(socket_path)
Parameters:
Name Type Description
--------------------------------------------------------------------------
socket_path string path where server socket is created and listen from
gc posint (optional) gc frequency of child process
Returns:
string
Synopsis: ServerSocket is a function which creates a unix domain socket and
starts listening on it. Each line sent to the socket will fork the darwin
process. The child process will get the received string as the return value
of the function and everything sent to standard out will be sent back
through the socket to the client program. The parent process will wait
forever and has to be killed externally. Note that garbage collection will
require a lot of data to be copied. Hence, gc frequency will by default be
assigned a very high value and childreen processes should not run too long.
Examples:
> req := ServerSocket('/tmp/server_square');
req := 5
> print(req ^ 2);
> quit;
See Also:
?CallSystem ?LockFile ?OpenReading ?TimedCallSystem
?gc ?OpenPipe ?Set
Set
echomapsizeplotdeviceplotoutputprintgcprofilepromptquietserverscreenwidthTotalDPCells
Function Set - Set system options and defaults
Option: builtin
Calling Sequence: Set(opt)
Parameters:
Name Type
--------------------------------
opt {string, string=anything}
Returns:
anything : previous value of the system variable
Synopsis: The Set command is used to assign to system variables.
Name Type Description
--------------------------------------------------------------------------------
BytesAlloc posint Returns the number of allocated bytes
echo posint Sets the level of input/output information displayed.
0 - no echo under any circumstance
1 - (default) echo whenever the input or the output
are not from/to the terminal, but do not echo as a
result of a read statement.
2 - echo whenever the input or the output are not
from/to the terminal.
3 - echo only as a result of read statements
4 - echo everything.
n - (n > 4): echo only as a result of read statements
nested less than n-4
The echo option is superseded by quiet, i.e.
if quiet=true, no echo will occur.
gc integer Sets the frequency (in words allocated) for garbage
collection.
mapsize integer Sets the minimum size (in chars) required for Darwin
to build a .map file for a database.
plotdevice string Sets the protocol for subsequent Draw commands.
(options: portrait (8.5x11 with 1/2' margin)
landscape (11x8.5 with 1/2' margin)
portraitA4 (210x297 with 1/2' margin)
landscapeA4 (297x210 with 1/2' margin)
plotoutput filename Name of the file to store the plotted code.
printgc boolean Toggles displaying garbage collection information.
printlevel integer Sets the amount of information which is printed out
during execution.
profile boolean Toggles printer/plotter profile mode
prompt string Sets the Darwin prompt.
quiet boolean Toggles the suppression of output.
screenwidth posint Sets the width of a line for all subsequent output.
server boolean Places Darwin in server mode.
TotalDPCells posint Return/sets the number of cell computed for DynProgr
Examples:
> Set(printgc);
false
> Set(plotdevice=landscape);
landscape
SetRand
Function SetRand
Option: builtin
Calling Sequence: SetRand(seed)
Parameters:
Name Type
--------------
seed integer
Returns:
NULL
Synopsis: Sets the seed of the random number generator. The sequence of
pseudo-random numbers generated depends uniquely on the seed, i.e. the same
seed will generate the same sequence.
Examples:
> SetRand(123);
See also: ?Rand ?SetRandSeed
SetRandSeed
Function SetRandSeed
Calling Sequence: SetRandSeed()
Returns:
NULL
Global Variables: SetRandSeed_value
Synopsis: Initialize the random number generator to produce a sequence
depending on the date, time and process id. This is normally a guarantee
that different processes end up with different random seeds. If printlevel
is 3 or higher, SetRandSeed will print the value that it has used for
SetRand() so that the same random sequence can be regenerated.
Examples:
> SetRandSeed();
> Rand();
0.4405
See also: ?CreateRandSeq ?Rand ?SetRand ?Shuffle
SetupRA
Function SetupRA - setup of the relative adaptivnes for CAI
Calling Sequence: SetupRA(mode)
Global Variables: CodonProb RA
Synopsis: Assigns the global variable RA needed by ComputeCAI.
See also: ?ComputeCAI ?RelativeAdaptiveness
SetuptRNA
Function SetuptRNA - set up functions for tRNA translations
Calling Sequence: SetuptRNA(d)
Parameters:
Name Type Description
-------------------------------------------------------
d list(list) a list (by aa) of list of codons or
d string the name of a known table of tRNA
Returns:
NULL
Global Variables: CIntTotInt_list IntTotInt_list ntRNA tIntToCInt_list
tIntToInt_list tIntTotRNA_list
Synopsis: This function sets up all the necessary functions to translate
tRNAs. These are from tInt to A, AAA, Amino, Int, CInt and Codon and from
Int and CInt to tRNA and tInt. Its input is either a string (which means a
predefined name) or it is a list of 20 (one per amino acid) lists of tRNAs.
The format is best given by an example, see the file lib/SetuptRNA.
Execution of SetuptRNA causes the following functions and values to be
defined:
Name Description
-----------------------------------------------------------------
ntRNA integer, the number of tRNA molecules used
-----------------------------------------------------------------
tIntToInt tInt (1..ntRNA) to Int (aa number, 1..20)
tIntToA tInt (1..ntRNA) to A (aa one-letter code)
tIntToAAA tInt (1..ntRNA) to AAA (aa 3-letter code)
tIntToAmino tInt (1..ntRNA) to Amino (aa full name)
-----------------------------------------------------------------
tIntToCInt tInt (1..ntRNA) to set of CInt (codon number, 1..64)
tIntToCodon tInt (1..ntRNA) to set of Codon (3-letter codon)
-----------------------------------------------------------------
tIntTotRNA tInt (1..ntRNA) to tRNA (tRNA name)
tRNATotInt tRNA (tRNA name) to tInt (1..ntRNA)
-----------------------------------------------------------------
IntTotInt Int (aa number, 1..20) to set of tInt (1..ntRNA)
IntTotRNA Int (aa number, 1..20) to set of tRNA (tRNA name)
-----------------------------------------------------------------
CIntTotInt CInt (codon number, 1..64) to tInt (1..ntRNA)
CIntTotRNA CInt (codon number, 1..64) to tRNA (tRNA name)
Currently the following names are recognized as arguments for SetuptRNA:
[Archaea, Bacteria, Eukaryota, eukaryotes, prokaryotes, YEAST, yeast]
Examples:
> SetuptRNA(yeast);
See also: ?ComputeTPI ?TPIDistr
ShortestPath
Function ShortestPath - shortest path from one node to all others
Calling Sequence: ShortestPath(g,i,excl)
Parameters:
Name Type Description
----------------------------------------------
g Graph given graph
i anything starting node
excl set (optional) excluded node set
Returns:
list([posint, numeric])
Synopsis: Compute the shortest path from node i to every connected node in g.
It is assumed that a non-negative numeric label on an Edge is the length of
the edge, that is the distance between the corresponding nodes. "excl" is
the set of nodes not to be considered and defaults to {}.
Examples:
> g := Graph( Edges(Edge(1.2,1,2),Edge(2,1,4),Edge(3,1,5),Edge(4,2,3),Edge(5,3,4)),Nodes(1,2,3,4,5));
g := Graph(Edges(Edge(1.2000,1,2),Edge(2,1,4),Edge(3,1,5),Edge(4,2,3),Edge(5,3,4)),Nodes(1,2,3,4,5))
> ShortestPath(g,1);
[[1, 0], [2, 1.2000], [3, 5.2000], [4, 2], [5, 3]]
See Also:
?BipartiteGraph ?Graph_minus ?Nodes
?Clique ?Graph_Rand ?ParseDimacsGraph
?DrawGraph ?Graph_XGMML ?Path
?Edge ?InduceGraph ?RegularGraph
?EdgeComplement ?MaxCut ?TetrahedronGraph
?Edges ?MaxEdgeWeightClique ?VertexCover
?FindConnectedComponents ?MinCut
?Graph ?MST
Shuffle
Function Shuffle
Calling Sequence: Shuffle(t)
Parameters:
Name Type
--------------------------------
t {string, list, structure}
Returns:
type(t)
Synopsis: Randomly permute the characters (when t is a string) or components
(when t is a list or a structure). A new object is created and the argument
is left unchanged.
Examples:
> Shuffle('abcdefghijklmnopqrstuvwxyz');
boujaqhsgrwldpkziexvctymfn
> Shuffle([1,2,3,4]);
[4, 3, 2, 1]
> Shuffle(ABC(a1,a2,a3,a4));
ABC(a1,a4,a2,a3)
See also: ?CreateRandPermutation ?CreateRandSeq ?Mutate ?Permutation
Signature
Function Signature( )
Calculate the signature for a specific data type
Trees: the signature is the same for isomorphic trees, and
for trees with different roots. Only the graph topology
is relevant.
The function has the following form: to get the signature
for two leaves a and b that are connected to the same node c,
the signature value for node c is
(x^a + x^b) modulo n.
n is a large number, i.e. 2^32
x is a "generator" number, which means that
x^1 mod n, x^2 mod n etc etc produces all numbers between
0 and n-1.
SignedSynteny
Function SignedSynteny - find the number of inversions of a signed permutation
Calling Sequence: SignedSynteny(perm)
Parameters:
Name Type Description
------------------------------------
perm list(integer) a permutation
Returns:
integer
Synopsis: SignedSynteny finds the minimum number of reversals needed to
transform the input permutation into an ascending straight run of positive
integers. The input permutation is a list of length n of the integers from
1 to n, where each number is also assigned a sign plus or minus (plus is
implicit). A reversal operation modifies a signed permutation by swapping
the order of a particular contiguous range and flipping the sign of the
elements in the range. The problem of finding the synteny distance between
two genomes, with known direction of every gene in the genomes, can be
reduced to the problem of finding the number of reversals. SignedSynteny
runs in O(n) and is an implementation of the algorithm described in "Kaplan
et al., Faster and simpler algorithm for sorting signed permutations by
reversals, SODA '97, ISBN:0-89871-390-0, 344-351, 1997.", except for a sub-
algorithm to find connected components in a special graph. A faster
algorithm to find the connected components is given in "Bader et al., A
linear-time algorithm for computing inversion distance between signed
permutations with an experimental study, WADS '01, ISBN:3-540-42423-7, 365-
376, 2001."
Examples:
> SignedSynteny([8, 9, -6, -1, 3, 5, -7, 2, 4]);
8
> SignedSynteny([4, 5, 6, -3, -1, -2]);
4
See also: ?DrawTree ?GapTree ?LeastSquaresTree ?PhylogeneticTree ?Synteny
SmallAllAll
Function SmallAllAll - do an all-against-all matching of a small database
Calling Sequence: SmallAllAll(MinSim)
Parameters:
Name Type Description
-------------------------------------------------------------
MinSim numeric optional cutoff value for match similarity
Returns:
NULL
Synopsis: This function does a complete match of all sequences in a database
against each other. A database must have been loaded previously with the
ReadDb command. This function works more like a program and it prints all
sorts of information about the all-all matching. A file named DB[Filename].
AA is created with the darwin-readable results of the matrix of matches.
Besides the matrix of matches, the file contains commands to build a
phylogenetic tree, a probabilistic ancestral sequence and a multiple
alignment of all the sequences. It is expected that the user will inspect
this file, and choose which commands to run. Some of the less used commands
are commented out in the output file. If the sequences of the database are
disconnected in several groups, that is no significant match can be found
between the sequences, these groups are placed in different files named
DB[FileName].i for consecutive values of i. If MinSim is omitted it
defaults to 100.
See also: ?AlignOneAll ?Match ?ReadDb
SortedMA
Function SortedMA( mulAlign:array(string), tree:Tree )
Returns the sequences of the multiple alignment sorted in order
of the original data base
SpToDarwin
Function SpToDarwin( flatfile:string, darwinfile:string, descr:string, compressed:boolean )
Converts a SwissProt flat file (flatfile) into a Darwin
loadable file (darwinfile). The actual data is prefixed by descr
which should contain the database name (DBNAME tag) and release
(DBRELEASE tag).
If compressed is specified and true, the flat file is read using zcat.
SpeciesCode
Function SpeciesCode - NCBI TaxonId to SwissProt species code
Calling Sequence: SpeciesCode(posint)
Parameters:
Name Type Description
---------------------------------
tax posint NCBI taxonomic ID
Returns:
string
Synopsis: Maps a NCBI taxonomic identifier to the SwissProt species code. If
the ID is not known, the function returns an error.
Examples:
> SpeciesCode(9606);
HUMAN
See also: ?TaxonId ?UpdateSpeciesCode
Species_Entry
Function Species_Entry - find all the entries for a given species
Calling Sequence: Species_Entry(specname)
Parameters:
Name Type Description
-----------------------------------
specname string species name(s)
Returns:
list(Entry)
Global Variables: SearchOS_table Species_table
Synopsis: Species_Entry returns all the entries in DB (which must be assigned
a sequence database) which match the given specname. This assumes that the
database has a field tagged with .. where the species information
is available. This is rather specific of SwissProt. The first time
Species_Entry is called, it builds a table of species and it may require
some time to compute. Following calls will be much more efficient.
Examples:
> Species_Entry('Abies firma');
[MATK_ABIFIQ9MV51;Maturase ..(1002).. S, RBL_ABIFIO78258;Ribulose ..(1081).. K]
See Also:
?DbToDarwin ?SearchAC ?SPCommonName
?GetEntryInfo ?SearchID ?SP_Species
SplitLines
Function SplitLines - make a list of lines from a string
Calling Sequence: SplitLines(s)
Parameters:
Name Type Description
---------------------------------------------------
s string a string which may contain newlines
Returns:
list(string)
Synopsis: SplitLines takes a string and breaks it after every newline
character ('\n'). Each of these lines are placed in an output list. If the
string does not end in a newline, the last string of the list will not end
in a newline. In other words, SplitLines just splits the string, it does
not introduce or remove any characters.
Examples:
> SplitLines('abc');
[abc]
> SplitLines('abc
xyz');
[abc
, xyz]
See Also:
?FileStat ?LockFile ?ReadLine ?ReadRawFile ?TimedCallSystem
?Lines ?OpenPipe ?ReadOffsetLine ?SearchDelim
Stat
Class Stat - Basic Univariate Statistics Package
Template: Stat()
Stat(Description)
Returns:
Stat
Fields:
Name Type Description
-----------------------------------------------------------------------
Number integer number of observations recorded
Mean numeric mean of the sample
Average numeric mean of the sample (same as Mean)
Variance numeric variance of the sample
VarVariance numeric variance of the observed variance
Skewness numeric coefficient of skewness (sidewise leaning)
Excess numeric excess (flatness, or kurtosis)
Min numeric the minimum of the sample
Minimum numeric the minimum of the sample
Max numeric the maximum of the sample
Maximum numeric the maximum of the sample
ShortForm string Description: MeanVar
StdErr numeric 95% conf. interval of mean
CV numeric coefficient of variance (std. dev/mean)
Description string user-defined description
MeanVar string form: xxx+-xx (mean and 95% conf. interval)
VarVar string form: xxx+-xx (variance and 95% conf. interval)
Methods: HTMLC plus print printf printpm Rand rawprint select
Stat_type string times union
Synopsis: Stat defines a new data structure to gather univariate statistical
information. Methods exist for printing, adding and creating a union of two
Stat data structures. The extraction of useful statistical data from the
information collected in a Stat data structure is performed with the
provided selectors.
References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun,
Ch 26.1
Examples:
> BooHoo := Stat('Stock Market Losses');
BooHoo := Stat(0,1.7797162035136915e+308,-1.7797162035136915e+308,0,0,0,0,0,Stock Market Losses)
> BooHoo2 := Stat('More Losses');
BooHoo2 := Stat(0,1.7797162035136915e+308,-1.7797162035136915e+308,0,0,0,0,0,More Losses)
> UpdateStat( BooHoo, 10000 ):
> UpdateStat( BooHoo, 30000 ):
> UpdateStat( BooHoo2, 50000 ):
> UpdateStat( BooHoo2, 60000 ):
> BooHoo[Mean];
20000
> BooHoo[Number];
2
> Akk := BooHoo union BooHoo2;
Akk := Stat(4,10000,60000,30000,1700000000,27000000000000,1130000000000000000,30000,Stock Market Losses and More Losses)
> print(BooHoo);
Stock Market Losses: number of sample points=2
mean = 20000 +- 19600
variance = 200000000 +- 999999
skewness=999999, excess=999999
minimum=10000, maximum=30000
See Also:
?CollectStat ?ExpFit ?LinearRegression ?UpdateStat
?Counter ?ExpFit2 ?OutsideBounds
StatTest
Chi-SquareG-testIndependenceFriedman-Rafsky
Function StatTest - Test a statistical hypothesis
Option: polymorphic
Calling Sequence: StatTest(test,data)
Parameters:
Name Type Description
-------------------------------------------------------------------------
test string Indicator which test should be done
data anything data used to test the hypothesis (type depends on test)
Returns:
TestStatResult
Synopsis: This function tests several statistical hypothesis. The type of
hypothesis to be tested is indicated via the first argument. Tests
implemented so far:
ChiSquare One-dimensional Chi-square test of independence (cells are
assumed equally-probably). "data" is a one-dimensional array
of counts (non-negative integers). The data can also be a
table or counts which must be indexed over the integers. Every
non-zero entry of the table will be assumed an entry in the
data.
ChiSquare Two-dimensional Chi-square test of independence (rows and
columns are assumed independent). "data" is a two-dimensional
array of counts (non-negative integers). The data can also be
a table of counts which must be indexed over pairs of integers
(lists of two integers). Every non-zero entry of the table
will be assumed an entry in the data.
Independence Two arrays of (any type of) data are grouped to test their
independence. The most significant Chi-square test is
reported.
FriedmanRafsky Tests whether two samples, usually multivariates, come from the
same distribution. Each sample must be inputed as a matrix in
which each column is a sample.
G One-dimensional G test of independence (cells are assumed
equally-probably). This is an instance of the likelihood ratio
test applied to a list of equiprobable events. "data" is a
one-dimensional array of counts (non-negative integers). The
data can also be a table or counts which must be indexed over
the integers. Every non-zero entry of the table will be
assumed an entry in the data.
G Two-dimensional G test of independence (rows and columns are
assumed independent). This is an instance of the likelihood
ratio test applied to tableaux. "data" is a two-dimensional
array of counts (non-negative integers). The data can also be
a table of counts which must be indexed over pairs of integers
(lists of two integers). Every non-zero entry of the table
will be assumed an entry in the data.
For each hypothesis an internal function will be called that computes the
test statistic from the data, the p-value from Cumulative and the standardized
deviation from CumulativeStd.
References: Rice JA, Mathematical Statistics and Data Analysis, 2nd ed.
chapter 13.4, p.489 Friedman, Rafsky (1979) "Multivariate Generalizations
of the Wald-Wolfowitz and Smirnov Two-Sample Tests"
Examples:
> StatTest( ChiSquare,[[1,2,3],[4,5,6],[7,8,9]] );
TestStatResult(ChiSquare,0.4688,0.9765,-1.9858,[[1, 2, 3], [4, 5, 6], [7, 8, 9]],Degrees_of_freedom = 4)
> StatTest( Independence, [A,B,B,B,B,A], [-1,3,4,3,4,-3] );
TestStatResult(ChiSquare,1.5000,0.2207,0.7699,[[2, 0], [2, 2]],Degrees_of_freedom = 1)
> StatTest( FriedmanRafsky, [[1,5],[2,-1],[1,3]], [[1,-1],[3,4]] );
TestStatResult(FriedmanRafsky,0.6547,0.5127,0.6547)
See Also:
?Cumulative ?OutsideBounds ?ProbCloseMatches ?Std_Score
?CumulativeStd ?ProbBallsBoxes ?Rand ?TestStatResult
Std_Score
Function Std_Score - conversion from standard deviations to Score
Calling Sequence: Std_Score(s)
Parameters:
Name Type Description
------------------------------------------------
s numeric a number of standard deviations
Returns:
numeric
Synopsis: This function converts a probability expressed in terms of standard
deviations to a Score (-10*log10(Prob)). This is done in such a way that
very large values can be handled with precision and without causing
overflow/underflow. Formally, a Score is defined as:
Score = -10 * log10( Prob{ Normal(0,1) < s } )
Examples:
> Std_Score( -30 );
1973.0921
> Std_Score( +30 );
2.131e-197
See Also:
?Cumulative ?OutsideBounds ?ProbCloseMatches ?StatTest
?CumulativeStd ?ProbBallsBoxes ?Rand
Student_Rand
Function Student_Rand - Generate random Student's-t distributed reals
Calling Sequence: Rand(Student(nu))
Parameters:
Name Type
------------------
nu nonnegative
Returns:
numeric
Synopsis: This function returns a random Student's t distributed number with
average 0 and variance nu/(nu-2). If X is a Normal(0,1) random variable and
X1 is a Chi-square random variable with parameter nu, X/sqrt(X1/nu) is
Student(nu) distributed. Student_Rand uses Rand() which can be seeded by
either the function SetRand or SetRandSeed.
References: Handbook of Mathematical functions, Abramowitz and Stegun, 26.7
Examples:
> Rand(Student(3));
-0.9813
> Rand(Student(100));
0.00779824
See Also:
?Beta_Rand ?Exponential_Rand ?Multinomial_Rand ?Shuffle
?Binomial_Rand ?FDist_Rand ?Normal_Rand ?StatTest
?ChiSquare_Rand ?GammaDist_Rand ?Poisson_Rand ?Std_Score
?CreateRandSeq ?Geometric_Rand ?SetRand ?Zscore
?Cumulative ?Graph_Rand ?SetRandSeed
SubDist
Function SubDist( t:Tree, i:integer, j:integer )
Get the distance in PAM units from leaf i to leaf j
SubTree
Function SubTree( MinSquareTree:Tree, pam )
generates an expression sequence of SubTrees from a given
MinSquareTree at a specified pam distance
SurfIntActPred
Function SurfIntActPred( MulAlign:array(string), MinSquareTree )
Generates the prediction of surface, interior and active site positions in a multiple alignment.
SurfOut
Function SurfOut( SurfMatrix:array(array(array)), SurfMatrixTot:array(array) )
Returns for each position the SurfProb of being on the surface,
the number of variable subgroups at the specified MaxPW and SurfAA used to
determine SurfProb
Surface
Function Surface( Cluster:list(list(list)), MA:array(string), MaxPW:array, SurfAA:array, ActMatrixOut:array )
Reports the number of variable subgroups at defined PAM windows
in which at least one amino acid is of the type defined in SurfAA
SurfaceTot
Function SurfaceTot( SurfMatrix:array(array(array)) )
Reports the sum of the number of variable subgroups at defined
PAM windows and SurfAAs counted over all positions
SvdAnalysis
Function SvdAnalysis( AtA:matrix(numeric), btA:list(numeric), btb:numeric, NData:posint, names:list(string), svmin:{numeric,First(posint)} )
SvdAnalysis does a least squares approximation and returns various measures
of quality of the fit.
Problem: Given a matrix of A (dim n x m) and a vector b (dim n),
we want to find a vector x (dim m) such that Ax ~ b.
This approximation is in the least squares sense,
i.e. ||Ax-b||^2 is minimum
The calling arguments are:
AtA is a matrix (dim m x m) which is the product A^t * A
btA is a vector (dim m) which is the product b^t * A
btb is the norm squared of b, i.e. ||b||^2 = b^t * b
NData is the number of data points (A is dim n x m)
names is a list (dim m) of the names associated with each
column of A, or with each value of x.
svmin is a positive numeric value. All singular values less than
svmin will not be used. Making svmin=0, all singular
values are used, and this is equivalent to pure least
squares. Alternatively, svmin can be the structure
First(k), where k is a positive integer not greater
than the dimension of AtA. In this case, the largest
k singular values will be used.
If the global variable ComputeSensitivity is set to false,
SvdAnalysis will not compute the sensitivity analysis and will
compute more quickly. For m > 100 this is highly recommended.
Output: The output is a darwin data structure
SvdResult( Norm2Err, SensitivityAnalysis, SingularValuesUsed,
SingularValuesDiscarded, Norm2Indep, MinNorm2Err,
SolutionVector, NData )
where:
Norm2Err is the norm squared of the resulting approximation, i.e.
||Ax-b||^2
SensitivityAnalysis is a list of 4-tuples with m entries, each one corresponding
for one variable. Each entry is [nnn,vvv,sss,ttt], where:
nnn is the name of the variable,
vvv is the result value (the x[i] value)
sss is an estimate of the standard deviation of vvv
ttt is the amount by which ||Ax-b||^2 will increase
if nnn would not be used. Two compute this
difference, all singular values are used.
The list is sorted by decreasing ttt
The list is only produced if the global variable
ComputeSensitivity is not set to false, otherwise it is
empty.
SingularValuesUsed is a list of the singular vales used ( > svmin )
SingularValuesDiscarded is a list of the singular values discarded ( <= svmin )
Norm2Indep is simply btb, the norm squared of the independent variables,
the maximum norm that could be reached
MinNorm2Err is the norm of ||Ax-b||^2 if all singular values were used,
i.e. is the minimum norm that could be achieved with
these m variables.
SolutionVector is the solution vector x
NData is the number of data points (A is of dimensions n x m)
A good summary explanation of the Svd analysis can be found in many
books, I like the one in Forsythe Malcolm and Moler, Computer Methods
for mathematical computations.
See Also: ?SvdBestBasis ?LSBestSum ?LSBestDelete ?LSBestSumDelete
SvdBestBasis
Function SvdBestBasis - Least squares by selecting best basis (subset)
Calling Sequence: SvdBestBasis(AtA,btA,btb,NData,names,k,svmin,try,startset)
Parameters:
Name Type Description
-------------------------------------------------------------------------
AtA matrix(m,m) the product of A^t * A
btA vector(m) the product b^t * A
btb numeric the norm squared of b, i.e. b*b
NData posint number of data points (dim A is n x m)
names list(string) names associated with each column of A
k posint number of variables in the solution
svmin numeric optional lower limit for using singular values
try posint optional, trials after a new local minimum
startset list(integer) optional, k column numbers to start
Returns:
SvdResult
Global Variables: SvdBestHash SvdBest_A SvdBest_d SvdGoodBases
SvdGoodPerms SvdHashSig Svd_svmin
Synopsis: SvdBestBasis finds the best set of k variables to do a least square
fit. For k<=2 this the result is the global minimum (and the variable "try"
is ignored), for k>2 this is a heuristic, not an exact algorithm, and its
precision depends on how many trials are performed. The problem of finding
the best set of variables, when done incrementally, one variable at a time,
is called Stepwise regression. The results of SvdBestBasis are generally
much better than those obtained by stepwise regression.
The problem is formally defined as follows: Given a matrix of A (dim n x m)
and a vector b (dim n), we want to find a vector x (dim m) such that Ax ~
b, where x has k non-zero components and m-k zero components. This
approximation is in the least squares sense, i.e. |Ax-b|^2 is minimum.
The output is a SvdResult data structure. The global variable SvdGoodBases is
assigned a list of SvdResult data structures for all the other local minima
that are found. The global variable SvdGoodPerms is assigned a list of the
permutations of the variables which gave the good bases in SvdGoodBases.
SvdBestBasis prints information as it computes. The amount of information
printed can be regulated with printlevel.
svmin is an optional positive numeric value. All singular values less than
svmin will not be used. Making svmin=0, all singular values are used, and
this is equivalent to pure least squares. The selection of singular values
is used for the final computation of the SvdResult, not for the computation
of the best basis.
try is an optional integer. It indicates the number of trials will be done
after a new local minima is found before stopping. If omitted, 15 trials
are done after the lowest norm has been found.
startset is an optional list of k integers. SvdBestBasis will start its
search for an optimal from this set. If try is greater than 1, then other
trials, starting at random sets, will also be tried.
See Also:
?ExpFit ?LSBestSum ?Stat ?SvdReduceGood
?LSBestDelete ?LSBestSumDelete ?SvdAnalysis ?SvdResult
SvdResult
Class SvdResult - results of a least squares approximation, Ax=b
Template: SvdResult(Norm2Err,SensitivityAnalysis,SingularValuesUsed,
SingularValuesDiscarded,Norm2Indep,MinNorm2Err,SolutionVector,
NData)
Fields:
Name Type Description
--------------------------------------------------------------------------------
Norm2Err numeric norm of approximation |Ax-b|^2
SensitivityAnalysis list(list) results with sensitivity analysis
SingularValuesUsed list(numeric) singular values used
SingularValuesDiscarded list(numeric) singular values discarded
Norm2Indep numeric norm of independent variables, |b|^2
MinNorm2Err numeric |Ax-b|^2 is all sv were used
SolutionVector list(numeric) least squares solution, x
NData posint number of data points (dim A is n x m)
Methods: HTMLC print Rand SvdResult_type
Synopsis: An SvdResult holds the result of a linear least squares
approximation. Such an approximation is normally generated by SvdAnalysis
or SvdBestBasis. The list with the sensitivity results has 4 entries per
variable. These are the name of the variable, the result value (the x[i]
value), an estimate of the standard deviation and the amount by which |Ax-
b|^2 will increase if this variable would not be used. Two compute this
difference, all singular values are used. This list is sorted in decreasing
order of the last argument. The list is only produced if the global
variable ComputeSensitivity is not set to false, otherwise it is empty.
See also: ?SvdAnalysis ?SvdBestBasis
Synteny
Function Synteny - find the number of inversions of a permutation
Calling Sequence: Synteny(perm,k)
Parameters:
Name Type Description
--------------------------------------------------
perm list(posint) a permutation
k posint (optional) effort to be done
Returns:
integer
Synopsis: Synteny finds an approximation to the minimum number of inversions
needed to transform the input permutation into a straight run (ascending or
descending). The input permutation is a list of the integers from 1 to n,
where n is the length of the list. An inversion operation is a modification
of a permutation which selects a particular contiguous range and swaps its
order. The problem of finding the synteny distance between to genomes can
be easily reduced to the problem of finding the number of inversions to
straighten the permutation. The parameter k gives the function a hint on
how much work should be done, it is the number of partial solutions that
will be kept during the search. The problem is NP-complete, so this
algorithm searches for a good approximate solution. The higher k, the more
work it will be done. For a particular problem, the amount of work is
linear in k. By default k=10.
Examples:
> Synteny( [1,7,8,9,6,5,4,2,3] );
3
> Synteny( [4,5,6,1,2,3,7,8,9] );
3
See Also:
?DrawTree ?LeastSquaresTree ?SignedSynteny
?GapTree ?PhylogeneticTree
SystemCommand
Function SystemCommand - execute a system command
Calling Sequence: SystemCommand(operation,addit_args)
Parameters:
Name Type Description
-----------------------------------------------------------
operation string the name of the system operation
addit_args string (optional) additional argument needed
Returns:
numeric
Synopsis: This command is provided to isolate system dependencies for
performing some operations which require execution of other, standard,
programs in the system. The optional additional arguments are dependent on
the operation and are typically file names on which the commands should be
run. The value returned is the integer value returned by the CallSystem
command that will run this operation. This command also allows for simple
customization for non-standard installations. In this case, the file lib/
SystemCommand may have to be extended with particular commands for your
system. The valid values for operation are:
HTML HTML viewer -- one additional parameter, the name of the file
which contains html source. The process should be
detached to allow stand-alone perusal.
postscript postscript viewer -- one additional parameter, the name of
the postscript file. (Usually a file ending in ".ps").
The process should detach to allow stand-alone perusal.
This is the command that will show all the darwin plots.
darwin darwin -- two additional parameters, the name of a file with
darwin input commands and the name of the file where the
output will be placed. The input file should end with a
"quit" command, else the spawned darwin will attempt to
read from the user once that all the commands are
executed.
gimp picture processing software (could be gimp, photoshop or
something equivalent) -- one additional parameter, the
name of the file (typically a jpg, gif, ps or pdf)
rm remove file(s) -- one additional argument with the name(s) of
the file(s) to be removed. The removing is forced and
without questions asked.
maple the maple computer algebra system -- two additional
parameters, the name of a file with maple input commands
and the name of the file where the output will be placed.
Maple is run with option quiet to avoid unnecessary/
confusing output.
See also: ?CallSystem ?date ?hostname ?TimedCallSystem
TPIDistr
Function TPIDistr - distribution of number of changes in a sequence
Option: builtin
Calling Sequence: TPIDistr(a1,a2,a3,a4)
Parameters:
Name Type Description
-------------------------------------------------
a_i posint number of symbols of the ith type
Returns:
list(numeric)
Synopsis: The arguments (any number from 1 to 4) are taken to be the number
of symbols of each type. a1 is the number of symbols of type 1, a2 the
number of symbols of type 2, etc. TPIDistr computes the probability
distribution of the number of transitions in a random sequence with a1, a2,
... symbols of each type. This has a special application in computing the
TPI index (tRNA Pairing Index) which measures how autocorrelated are the
tRNAs that translate a given amino acid, independently of the frequencies of
the tRNAs and codons.
The distribution is returned in a list, and the first entry corresponds to 0
changes, the second to 1 change, etc. The number of changes can never
exceed a1+a2+...-1, so the list returned is of length a1+a2+...
For example, there are 3 ways of permuting 2 A's and one B. AAB, ABA and BAA.
Two sequences have one transition and one sequence has two transitions. so
the result in this case should be [0,2/3,1/3].
Examples:
> TPIDistr(1,2);
[0, 0.6667, 0.3333]
> TPIDistr(1,2,3,4);
[0, 0, 0, 0.00190476, 0.01714286, 0.08095238, 0.2167, 0.3310, 0.2671, 0.08523810]
See also: ?ComputeTPI ?SetuptRNA
TT
Class TT - placeholder for text that should be displayed "as is"
Template: TT(string1,...)
Fields:
Name Type Description
---------------------------------------------
string1 string text to be displayed as is
Returns:
TT
Methods: HTMLC LaTeXC print string TT_type
Synopsis: The TT data structure holds text that is to be displayed using a
constant width font (like in a typewriter)
Examples:
> TT( 'for i to 10 do lprint(i^2) od');
TT(for i to 10 do lprint(i^2) od)
See Also:
?Block ?Document ?latex ?Roman
?Code ?HTML ?List ?RunDarwinSession
?Color ?HyperLink ?Paragraph ?screenwidth
?Copyright ?Indent ?PostscriptFigure ?Table
?DocEl ?LastUpdatedBy ?print ?View
Table
Class Table - structure to print/display tables
Template: Table(arg1,...,argn)
Fields:
Name Type Description
-------------------------------------------------------------------------------------------
arg1..n anything components of table in any order
center the entire table is centered
border the entire table is framed with a border
gutter=posint set gutter between columns
gutter=list(posint) set gutter for each individual column
ColAlign({string,p(posint)}...) set alignment for each individual column
RowAlign(string) set vertical alignment for following rows
('l', 'c' and 'r' for left, center, right)
Row(args) a row of data, each argument in a column
title=string title/caption to describe the table
Values(args) args to be distributed columwise
rowwise uses Values(), but args are distributed rowwise
width=posint width of the table in characters
Rule draw a horizontal line
SpanPrevious possible argument of Row
Returns:
Table
Methods: HTMLC LaTeXC print string Table_type
Synopsis: The Table structure holds information describing a table (or
tabular information). This is expected to be laid out as a table either as
text, latex, html or something else. If a Row structure has an element with
the name 'SpanPrevious', then the previous entry will be expanded to occupy
also the space of this entry (like \multicolumn in latex or colspan in
html). The alignment inside the cells are set with ColAlign - either l
(left), r (right), c (center) or p(x) (paragraph with a fixed width of x
characters).
Examples:
> t := Table( center, border, gutter=4, Row('abc','cde'),Row(1,1e9)):
> print(t);
-----------------------
| abc cde |
| 1 1000000000 |
-----------------------
See Also:
?Block ?Document ?latex ?Roman
?Code ?HTML ?List ?RunDarwinSession
?Color ?HyperLink ?Paragraph ?screenwidth
?Copyright ?Indent ?PostscriptFigure ?TT
?DocEl ?LastUpdatedBy ?print ?View
TaxonId
Function TaxonId - SwissProt species code to NCBI TaxonId
Calling Sequence: TaxonId(string)
Parameters:
Name Type Description
--------------------------------------
org posint SwissProt species code
Returns:
integer
Synopsis: Maps a SwissProt species code to the NCBI taxonomic identifier. If
the species code is not known, the function returns an error.
Examples:
> TaxonId('HUMAN');
9606
See also: ?SpeciesCode ?UpdateSpeciesCode
TaxonomyDownload
Function TaxonomyDownload - downloads the UniProt species taxonomy and
converts them to a Darwin readable format
Calling Sequence: TaxonomyDownload()
Returns:
NULL
Synopsis: Downloads the UniProt species taxonomy hierarchy from the UniProt
webpage and converts them to Darwin tables that are stored in the file
UniProtTaxonomy.drw which is located in Darwin' data directory.
See also: ?SpeciesCode ?TaxonId ?TaxonomyEntry
TaxonomyEntry
Class TaxonomyEntry - data structure holding TaxonomyEntry information
Fields:
Name Type Description
----------------------------------------------------------------------------------------
id {integer,string} the id/name of the taxonomic level
Scientific Name string scientific name of level
Common Name string common name of level (or empty string)
Synonym string synonym name of level (or empty string)
Other names list(string) list of other names of level
Species code string the UniProt species identifier (or empty string)
Parent TaxonomyEntry the direct parent node in the taxonomy
Children list(TaxonomyEntry) the direct children node in the taxonomy
Lineage list(string) the lineage tree
Lineagestring string the lineage tree as one string ('; ' separated)
Methods: print Rand select string TaxonomyEntry_type
Synopsis: The TaxonomyEntry datastructre allows to easily access the
different names, IDs and parent-/children entries. The selectors are all
case insensitive. The constructor of this function accepts a taxonomic
identifier, a UniProt species identifier or a scientific species name and
returns the instance of the TaxonomyEntry datastructre with the desired
taxonomic level.
Examples:
> t := TaxonomyEntry(9606);
t := TaxonomyEntry(9606)
> seq(z['sciname'], z= t['children']);
Homo sapiens neanderthalensis, Homo sapiens ssp. Denisova
> t['comname'];
Human
See also: ?SpeciesCode ?TaxonId ?TaxonomyDownload
TempName
Function TempName( )
Generate file names that can safely be used for a
temporary file. Optional arguments are: Dir = string and Prefix = string
which allow the user to control the choice of a directory and prefix.
TestGradHessian
Function TestGradHessian
Calling Sequence: TestGradHessian(f,f1,f2,point)
Parameters:
Name Type Description
----------------------------------------------------------------------
f procedure multivariate numerical function
f1 procedure gradient of f, returns a vector
f2 procedure hessian of f, returns a square matrix
point list(numeric) (optional) value at which to test
n posint (optional) dimension of argument of f
Tol Tolerance = positive (optional, default=100) error tolerance
Returns:
boolean
Synopsis: The TestGradHessian function is used to test whether the first and
second derivatives of a function are computed correctly. This test is run
at the given point (or at a random point instead). The arguments to f, f1
and f2 are vectors (lists) of dimension n. The output of f must be a
number, the output of f1 must be a list of numbers of dimension n (the
partial derivatives of f) and the output of f2 must be a matrix (n x n) with
the second partial derivatives of f.
TestGradHessian computes approximations to the gradient and the hessian by
computing f and f1 at various points. If the results are within 100 times
the minimal expected error, the function returns true, else it prints some
information about the failure and returns false. The error tolerance can be
changed from 100 to any desired number with the corresponding optional
argument
Examples:
> f := proc(x) cos(x[1])*tan(x[2]) end:
> f1 := proc(x) [-sin(x[1])*tan(x[2]), cos(x[1])*(1+tan(x[2])^2)] end:
> f2 := proc(x) [[-cos(x[1])*tan(x[2]), -sin(x[1])*(1+tan(x[2])^2)],
[-sin(x[1])*(1+tan(x[2])^2), 2*cos(x[1])*tan(x[2])*(1+tan(x[2])^2)]] end:
> TestGradHessian(f,f1,f2,[0.3,0.5]);
true
> TestGradHessian(f,f1,f2,[0.3,0.5],Tolerance=0.4);
(2,1) second derivative, error too large: -3.97238e-12,
f1[1]p=-0.161445100295, f1[1]m=-0.16144174911,
(f1[1]p-f1[1]m)/h=-0.38371715, f2[2,1]=-0.38371715, h=8.73348e-06
err / (DBL_EPSILON*(|gp[j]|+|gm[j]|)/h) = -0.483891
See Also:
?BFGSMinimize ?MaxLikelihoodSize ?MinimizeFunc
?DisconMinimize ?Minimize2DFunc ?MinimizeSD
?MaximizeFunc ?MinimizeBrent ?NBody
TestStatResult
Class TestStatResult - result of a statistical test
Template: TestStatResult(name,TestStat,pvalue,pstd)
Fields:
Name Type Description
-----------------------------------------------------------------------
name string name of the statistical test
TestStat numeric test statistic computed from the data
pvalue numeric p-value (probability value)
pstd numeric p-value in standard deviations
plog numeric natural logarithm of the p-value
CountMatrix array(integer) count matrix, (optional, e.g. ChiSquare)
Methods: print Rand select string Table TestStatResult_type
Synopsis: A TestStatResult holds a result of a statistical test. It is
normally generated by the StatTest function. The pvalue depends on the
test, and in general it is the probability that such a result is obtained by
chance. Extreme values (very close to 0 or very close to 1) are hence very
rare. The pstd value measures the pvalue too. It is the number of standard
deviations away that the pvalue would be if it were a normally distributed
variable. It is useful to measure very extreme probabilities, where the
p-values may be out of precision. For extremely small values of the
p-value, the selector plog may be more practical, it records the natural
logarithm of the p-value. For the ChiSquare test the count matrix is
returned as the fifth field and is associated with the selector CountMatrix.
Besides the first four fields and the CountMatrix, the structure can hold any
number of additional arguments, which are test-dependent. These extra
arguments are typically of the form string=anything. TestStatResult prints
nicely via the print method. Any symbol of type string=anything will be
printed using the format (%s = %a). Any occurrences of _ in the string will
be replaced by a space. An example for this is Degrees_of_freedom=x in the
ChiSquare test.
See also: ?StatTest
TetrahedronGraph
Function TetrahedronGraph - generate graphs describing regular polyhedra
Calling Sequence: TetrahedronGraph()
HexahedronGraph()
OctahedronGraph()
IcosahedronGraph()
DodecahedronGraph()
Returns:
Graph
Synopsis: Generate a graph which corresponds to a regular polyhedra. That
is, a graph whose vertices correspond to the vertices of a regular
polyhedra, and so its edges.
Examples:
> TetrahedronGraph();
Graph(Edges(Edge(0,1,2),Edge(0,1,3),Edge(0,1,4),Edge(0,2,3),Edge(0,2,4),Edge(0,3,4)),Nodes(1,2,3,4))
> HexahedronGraph();
Graph(Edges(Edge(0,1,2),Edge(0,1,4),Edge(0,1,5),Edge(0,2,3),Edge(0,2,6),Edge(0,3,4),Edge(0,3,7),Edge(0,4,8),Edge(0,5,6),Edge(0,5,8),Edge(0,6,7),Edge(0,7,8)),Nodes(1,2,3,4,5,6,7,8))
> OctahedronGraph();
Graph(Edges(Edge(0,1,2),Edge(0,1,3),Edge(0,1,4),Edge(0,1,5),Edge(0,2,3),Edge(0,2,5),Edge(0,2,6),Edge(0,3,4),Edge(0,3,6),Edge(0,4,5),Edge(0,4,6),Edge(0,5,6)),Nodes(1,2,3,4,5,6))
> IcosahedronGraph();
Graph(Edges(Edge(0,1,2),Edge(0,1,3),Edge(0,1,4),Edge(0,1,5),Edge(0,1,6),Edge(0,2,3),Edge(0,2,6),Edge(0,2,7),Edge(0,2,8),Edge(0,3,4),Edge(0,3,8),Edge(0,3,9),Edge(0,4,5),Edge(0,4,9),Edge(0,4,10),Edge(0,5,6),Edge(0,5,10),Edge(0,5,11),Edge(0,6,7),Edge(0,6,11),Edge(0,7,8),Edge(0,7,11),Edge(0,7,12),Edge(0,8,9),Edge(0,8,12),Edge(0,9,10),Edge(0,9,12),Edge(0,10,11),Edge(0,10,12),Edge(0,11,12)),Nodes(1,2,3,4,5,6,7,8,9,10,11,12))
> DodecahedronGraph();
Graph(Edges(Edge(0,1,2),Edge(0,1,5),Edge(0,1,6),Edge(0,2,3),Edge(0,2,8),Edge(0,3,4),Edge(0,3,10),Edge(0,4,5),Edge(0,4,12),Edge(0,5,14),Edge(0,6,7),Edge(0,6,15),Edge(0,7,8),Edge(0,7,16),Edge(0,8,9),Edge(0,9,10),Edge(0,9,17),Edge(0,10,11),Edge(0,11,12),Edge(0,11,18),Edge(0,12,13),Edge(0,13,14),Edge(0,13,19),Edge(0,14,15),Edge(0,15,20),Edge(0,16,17),Edge(0,16,20),Edge(0,17,18),Edge(0,18,19),Edge(0,19,20)),Nodes(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20))
See Also:
?BipartiteGraph ?Graph_minus ?Nodes
?Clique ?Graph_Rand ?ParseDimacsGraph
?DrawGraph ?Graph_XGMML ?Path
?Edge ?InduceGraph ?RegularGraph
?EdgeComplement ?MaxCut ?ShortestPath
?Edges ?MaxEdgeWeightClique ?VertexCover
?FindConnectedComponents ?MinCut
?Graph ?MST
TextBlock
Class TextBlock - builds a named block around content
Template: TextBlock(blockname,content1,...)
Returns:
TextBlock
Fields:
Name Type Description
--------------------------------------------------------------
blockname string the name of the block
content_i {string,structure} the text content of the block
Methods: HTMLC LaTeXC string TextBlock_type
Synopsis: A TextBlock is only meaningful in the context of a structured
output format such as LaTeX or (X)HTML. If used in a normal print statement,
TextBlock will just output the content parameters. If used in a LaTeXC
statement, TextBlock will create an environment called 'blockname' around
the content.
Examples:
> b := TextBlock( 'abstract', 'This is my funny abstract.' );
b := TextBlock(abstract,This is my funny abstract.)
> print(b);
This is my funny abstract.
> prints(LaTeXC(b));
\begin{abstract}This is my funny abstract.\end{abstract}
See Also:
?Block ?HTML ?Paragraph ?Table
?Code ?HyperLink ?PostscriptFigure ?TT
?Color ?Indent ?print ?View
?Copyright ?LastUpdatedBy ?Roman
?DocEl ?latex ?RunDarwinSession
?Document ?List ?screenwidth
TextHead
Function TextHead - Find the beginning of a string
Option: builtin
Calling Sequence: TextHead(x)
Parameters:
Name Type Description
-----------------------------------
x string an arbitrary string
Returns:
integer
Synopsis: Returns the offset to be added to x (on the left) to obtain the
first character of the string containing x.
Examples:
> a := 'CYQQSVWPFMDYQQFQGFSWKMPLGNNH';
a := CYQQSVWPFMDYQQFQGFSWKMPLGNNH
> a1 := a[10..20];
a1 := MDYQQFQGFSW
> TextHead(a1);
-9
> TextHead(a1)+a1;
CYQQSVWPFMDYQQFQGFSW
See also: ?GetOffset ?TextHandling
TimedCallSystem
Function TimedCallSystem
Option: builtin
Calling Sequence: TimedCallSystem(cmd)
TimedCallSystem(cmd,timeout)
Parameters:
Name Type
-----------------------------------------------
cmd a string containing a system command
timeout an optional integer number of seconds
Returns:
[integer, string] : return code and result of command
Synopsis: The "cmd" argument is passed to the underlying operating system.
If the optional "timeout" argument is specified, Darwin allows for "timeout"
seconds of execution. If the command does not terminate in the allocated
time, it is killed and the TimedCallSystem returns [-1, '(Timeout)'],
otherwise it returns a list consisting of the execution return code value
returned by the operating system and the output generated by cmd. The
output is returned as a string. It will normally be ended with a newline
character. Normally, a return code 0 indicates successful execution.
Examples:
> TimedCallSystem(date,10);
[0, Tue Feb 19 10:54:49 CET 2013
]
> TimedCallSystem('sleep 5',3);
[-1, (Timeout)]
See also: ?CallSystem ?SystemCommand ?time ?UTCTime
TotalAlign
Function TotalAlign
Option: builtin
Calling Sequence: TotalAlign(m,DM,goal)
Parameters:
Name Type Description
------------------------------------
m Match a Match
DM DayMatrix a Dayhoff Matrix
goal numeric a threshold value
Returns:
list(Match)
Synopsis: The TotalAlign function implements the Smith-Waterman algorithm
SmithW81 with an extension to find all independent local alignments of the
complete sequences of 'm' reaching a score of at least 'goal'. The
alignments are computed at PAM distance defined by the similarity matrix DM.
Examples:
See also: ?CreateDayMatrices ?MAlign
TotalTreeWeight
Function TotalTreeWeight( t:Tree )
Returns the sum of the length of all branches im PAM units
Transcribe
Function Transcribe - DNA to RNA
Calling Sequence: Transcribe(dna)
Parameters:
Name Type Description
-------------------------------
dna string string of bases
Returns:
string
Synopsis: Replaces all T with U.
Examples:
> Transcribe('ATG');
AUG
See also: ?BackTranscribe ?Translate
Translate
Function Translate - DNA to Protein
Calling Sequence: Translate(dna)
Parameters:
Name Type Description
-----------------------------------------
dna string sequence to be translated
Returns:
string
Synopsis: Translate a DNA sequence into a protein sequence.
Examples:
> Translate('ATGAAATTTTAA');
MKF
See also: ?BackTranslate ?Transcribe
Tree
Class Tree - Internal node of a binary Tree
Template: Tree(Left,Height,Right,xtra)
Fields:
Name Type Description
--------------------------------------------------------
Left {Leaf,Tree} recursive left subtree
Right {Leaf,Tree} recursive right subtree
Height anything any information, usually height
xtra anything (optional) additional information
Returns:
Tree
Methods: GetPartitions Graph GraphR matrix Newick Rand select
Signature Tree_type
Synopsis: The Tree data structure holds binary trees which may or may not be
labelled and/or weighted. The Left and Right subtree of the tree are either
(1) a Tree structure or (2) a Leaf structure. Many built-in Darwin routines
for phylogenetic trees, assume that the Height field refers to the height of
the node. These routines include DrawTree. The use of the xtra field
varies significantly from algorithm to algorithm.
Examples:
> t := Tree(Tree(Leaf(A),5,Leaf(B)),0,Tree(Leaf(C),11,Leaf(D)));
t := Tree(Tree(Leaf(A),5,Leaf(B)),0,Tree(Leaf(C),11,Leaf(D)))
> t[Left];
Tree(Leaf(A),5,Leaf(B))
> t[Right];
Tree(Leaf(C),11,Leaf(D))
See Also:
?BipartiteSquared ?IntraDistance ?Prefix
?BootstrapTree ?Leaf ?RBFS_Tree
?ComputeDimensionlessFit ?LeastSquaresTree ?ReconcileTree
?DrawTree ?Leaves ?RobinsonFoulds
?GapTree ?PhylogeneticTree ?SignedSynteny
?Infix ?Postfix ?Synteny
TreeAngles
Function TreeAngles( g:Graph )
Find angles for edges of g (being a tree) in order to draw it.
TreeConstruction
Data structure TreeConstruction( )
Function: creates a gap heuristic data structure
Selectors:
Algorithm: Type string
Method of tree construction.
PROB: probabilistic model (MinSquareTree)
TSP: TSP method
PHYLIP: an algorithm from the phylip package
default: PROB
Method: Type string
If method is TSP, then type describes what kind of TSP method to use.
The type describes which leaves to connect in the connection step.
Possible values are:
- LINEAR: smallest "swapping" error (order n)
The next three methods calculate ALL errors per step(order n^2).
- MINSQUARE: chose leaves with minimum sum of square of errors
- AVERAGE: smallest average error
- MINMAX: minimal maximal error
- TREE: minimal tree fitting index of subtree
constructs a MinSquareTree of each subtree
- DOUBLETSP: use TSP again to find *another* circular
order and connect the leaves that were swapped.
Is of order n^3 of course in each step
... so about n^4 in total
Phylip package:
- NEIGHBOR
- KITSCH
- FITCH
default: LINEAR
Relative: Type boolean
true if relative error should be considered.
false if absolute error should be used.
default: true
Simultan: Type real
values < 0 mean only do one connection at a time
Otherwise it is the maximal relative error up to
which connections are made.
range: -1, 0.0 - 1.0
default: 0.1
Dynamic: Type real
values < 0 mean do NOT use dynamic programming.
Otherwise it is the maximal relative error up to which
connections are made.
range: -1, 0.0 - 1.0
default: 0.2
AdjustEps: type boolean
true if the maximal error (param. dynamic) should be adjusted, if
the smallest error is larger than the error specified, or if all
errors are smaller than the error specified.
fals if errors should not be changed.
Default: true
Maxbranch: Type real
Maximum number relative to n (nr of leaves) up to which connections
should be kept, rounded to the next bigger integer.
Only used if Dynamic is > 0.
range: -1, 0.0 - 1.0
default: 4
Minbranch: Type real
Used if Dynamic > 0.
Determines how many connections should be considered in one step
in ANY case, even if the error is too large.
Values between 0.0 and 1.0 are relative to n (nr of leaves).
Values > 1 are absolut values
There is always at least ONE connection.
range: 0.0 - 1.0, positive integer
default value: 1
Limit: Type real
Max. number of trees to keep in memory, if Dynamic is > 0
-1 means no limit.
The number is relative to n, the number of leaves.
range: -1, positive integer
default: 3
Data: anything
could be an array of statistics or any other information
MSAScores: boolean
true: uses the scores calculated form the MSA to reconstruct tree
false: uses scores from the allall to reconstruct the tree
default: false
Scoring: String
Scoring of trees.
Can be PAM, SCORE, ERROR, INDEX, COMBINED
PAM: tree w. smallest PAM distance is best tree
SCORE: tree w. largest SCORE is best tree
ERROR: tree w. smalles turn-error is best tree
INDEX: tree w. smalles fitting index is best tree
MSA: tree w. best associated msa is best tree
default: PAM
Datatype: String
Data used for tree construction.
Can be PAM, SCORE
PAM: use PAM distances instead of scores
SCORE: use scores and not PAM distances
default: PAM
TreeResult
Class TreeResult - the result of a tree reconstrution call
Template: TreeResult(Tree,Type,Other)
Fields:
Name Type Description
---------------------------------------------------------------------------------------
Tree Tree the maximum likelihood tree
Type string type of reconstruction (ML/Distance/Parsimony/Other)
Name string (opt) arbitrary name to identify the tree
Likelihood numeric (opt) log(Likelihood) for ML trees
Alpha numeric (opt) alpha parameter of Gamma correction
InvSites numeric (opt) invariant sites
BaseFreqs list(numeric) (opt) base frequencies
SubstModel string (opt) substitution model
Method string (opt) name of the function used to build the tree
CPUtime numeric (opt) seconds use to build the tree
LSError nonnegative (opt) Weighted branch length errors (Distance)
CharChanges integer (opt) Number of character changes needed
(Parsimony)
LnLperSite list(numeric) (opt) List of loglikelihood values per site
Methods: print Rand select string TreeResult_type
Synopsis: A TreeResult stores the result of a maximum likelihood tree
reconstrution. Parameters, that have not been estimated are unassigned.
See also: ?PhyML ?RAxML ?RellTree ?Tree
TreeSize
Function TreeSize - Number of leaves in a tree
Calling Sequence: TreeSize(t)
Parameters:
Name Type Description
-------------------------
t Tree a Tree
Returns:
integer
Synopsis: Traverse a tree and returns the number of leaves.
Examples:
> t := Rand(Tree):
> TreeSize(t);
12
See also: ?CenterTreeRoot ?RotateTree
TreeStatistics
Data structure TreeStatistics( )
Data structure that keeps statistical data about tree constructions and methods
Selectors:
Type: Tree
Information on the Tree that was used
Construction: TreeConstruction
Information about the TreeConstruction type that was used
Real: Integer
Number of exact tree constructions (in position 1)
Prob: Integer
Number of trees that were the same as the tree calculated
by the probabililistic model
Total: Integer
Total number of trees construced
Time: Stat()
Construction time
Position: Stat()
Position of the real tree in the list of constructed trees
1 is optimal
Error: Stat()
Average error for each connection step
Number: Stat()
Average number of trees at the end of construction
Index: Stat()
Tree fitting index
Deltaindex: Stat()
Difference of tree fitting index of real tree and constructed tree
Topology: Stat()
Average topology distance of trees
Name: string
Name/Title of these statistics
Found: Integer
How often was the tree found (anywhere)
Notfound: Integer
Goodindex: Integer
If tree was not found:
how often was index larger than that of real tree (-> good measure)
Goodpam: Integer
If tree was not found:
how often was total pam distance larger than that of real tree (-> good measure)
Goodscore: Integer
If tree was not found:
how often was score smaller than that of real tree (-> good measure)
Goodmsa: Integer
If tree was not found:
how often was score of the msa smaller than that of real tree
(-> good measure)
Msa: Numeric
Difference in Score of real msa minus score of calculated msa of constructed
tree
TreeToPam
Function TreeToPam( tree )
returns a expression sequence which contains the PAM distance
of the leafs of a tree (or a leaf)
Tree_Graph
Function Tree_Graph( no:Tree )
Convert a binary tree into a graph (unrooted tree).
Tree_matrix
Function Tree_matrix - Distence Matrix induced from Tree
Calling Sequence: Tree_matrix(t)
Tree_matrix(t,leaves)
Parameters:
Name Type Description
------------------------------------------------------------------
t Tree the given tree
leaves {list,procedure,table} (optional) leaf to index mapping
Returns:
matrix(nonnegative)
Synopsis: This function extracts the pairwise distances between any two
leaves on a tree and returns them in a distance matrix. If the optional
'leaves' argument is not provided, the 'Label' or 3rd field of the Leaf
datastructures have to contain the indices to the matrix. Otherwise, the
leaves argument has to be either a list of leaf labels, a table pointing
from labels to indices or a function returning for a leaf datastructure the
appropriate index.
Examples:
> t := Tree(Leaf(A,1.2),0,Tree(Leaf(B,1.8),0.9,Leaf(C,1.4)));
t := Tree(Leaf(A,1.2000),0,Tree(Leaf(B,1.8000),0.9000,Leaf(C,1.4000)))
> Tree_matrix(t,[A,B,C]);
[[0, 3, 2.6000], [3, 0, 1.4000], [2.6000, 1.4000, 0]]
See also: ?CreateArray ?Leaf ?LeastSquaresTree ?PhylogeneticTree ?Tree
UTCTime
Function UTCTime - UTC time in seconds or wall-clock time of evaluation
Option: builtin
Calling Sequence: UTCTime()
UTCTime(expr)
Parameters:
Name Type
-----------------
expr expression
Returns:
numeric
Synopsis: This function returns the total wall-clock time taken to evaluate
the expression expr. When no expression is passed, it returns the number of
seconds since 00:00:00 GMT, January 1, 1970. This is called UTC time or
Coordinated Universal Time.
Examples:
> UTCTime();
1361267692.2392
> UTCTime(log10(factorial(100)));
5.0068e-06
> UTCTime( CallSystem('sleep 2') );
2.0325
See also: ?date ?time ?TimedCallSystem
UnassignGlobals
Function UnassignGlobals - unassigns all global variables from a given
function
Calling Sequence: UnassignGlobals(func)
UnassignGlobals(func,ex)
Parameters:
Name Type Description
----------------------------------------
func procedure the function
ex set (optional) exceptions
Returns:
NULL
Synopsis: UnassignGlobals unassigns all global veriables that are set by a
given function. The optional second argument allows the user to define a set
of variables that should be excluded from this.
Examples:
> Clique(TetrahedronGraph());
{1,2,3,4}
> CliqueUpperBound;
4
> UnassignGlobals(Clique);
> CliqueUpperBound;
CliqueUpperBound
See also: ?Globals
UnionFind
Class UnionFind - Implementation of the Union-Find data structure and
algorithm
Template: UnionFind(Elements)
UnionFind()
Fields:
Name Type Description
----------------------------------------------------------------------
Elements {list,list(set)} (optinal) initial elements
Clusters list(set) sets resulting from the union operations
Returns:
UnionFind
Methods: plus print select string union UnionFind_type
Synopsis: The Union-Find data structure allows one to repetetly join two
sets. The algorithm's performance, given m union/find operations of any
ordering, on n elements takes O(log(n)*m*a(m,n)) where a(m,n) is the inverse
ackermann function, thus close to O(log(n)) per operation.
Sets can be unified by performing a union operation on the UnionFind data
structure and a list containing two elements, one from each of the two sets.
New sets can be added two the data structure using the plus function.
References: Algorithmen und Datenstrukturen, T. Ottmann and P. Widmayer,
Spektrum, Akad.Verl,,1996
Examples:
> uf := UnionFind([{22,14,31},{12,41,23},{4},{99,25}]):
> union(uf,[14,99]):
> uf[Clusters];
[{4}, {12,23,41}, {14,22,25,31,99}]
> uf + {33,2,6}:
> union(uf, [2,4]):
> uf[Clusters];
[{12,23,41}, {14,22,25,31,99}, {2,4,6,33}]
UpdateSpeciesCode
Function UpdateSpeciesCode - downloads the SwissProt-NCBI species mapping
Calling Sequence: UpdateSpeciesCode()
Synopsis: Downloads the mapping between the SwissProt species codes and the
NCBI taxonomic identifiers from http://www.expasy.ch/cgi-bin/speclist and
converts it into a Darwin readable file called speciescode.drw which is
located in Darwin's data directory.
See also: ?SpeciesCode ?TaxonId
UpdateStat
Function UpdateStat - Add sample point to Stat Data Structure
Calling Sequence: UpdateStat(name,number)
Parameters:
Name Type Description
-----------------------------------------------------------
name Stat Stat data structure to be updated
number numeric value to be added to Stat data structure
Returns:
Stat
Synopsis: UpdateStat is used to add a sample point to an existing Stat data
structure.
Examples:
> BooHoo := Stat('Stock Market Losses'):
> UpdateStat( BooHoo, 10000 ):
> UpdateStat( BooHoo, 30000 ):
> BooHoo[Mean];
20000
> BooHoo[Number];
2
> print(BooHoo);
Stock Market Losses: number of sample points=2
mean = 20000 +- 19600
variance = 200000000 +- 999999
skewness=999999, excess=999999
minimum=10000, maximum=30000
See Also:
?CollectStat ?ExpFit ?LinearRegression ?Stat
?Counter ?ExpFit2 ?OutsideBounds
VertexCover
Function VertexCover - Vertex Cover exact/approximate algorithm
Option: builtin
Calling Sequence: VertexCover(A)
Parameters:
Name Type Description
--------------------------
A Graph a Graph
Returns:
set
Synopsis: The input to this algorithm is an undirected graph. An undirected
graph is represented as a Graph data structure which should accept two
selectors: Nodes and Edges. The Vertex Cover problem is finding the minimum
set of vertices which "cover" all edges. That is a minimum size set of
vertices such that each edge is incident to at least one of the vertices in
this set.
The output is a set of the Nodes in the vertex cover. The algorithm computes
a lower bound on the size of the vertex cover which is left in the global
variable VertexCoverLowerBound. If this coincides with the size of the
answer, it means that the answer is optimal. The global variable
VertexCoverIterFactor may be assigned a non-negative number f. The
algorithm will then run for f*n^2 iterations. If f=0 then only the greedy
heuristic is run, and this is quite fast. The larger f, the more accurate
the answers will be, and the more time the algorithm will consume.
The Vertex Cover problem is closely related to the Clique problem. They can
be related by the following formula:
VertexCover(G) = NodeComplement(Clique(EdgeComplement(G)))
Examples:
> VertexCover(PetersenGraph());
{1,2,3,6,8,10}
> VertexCoverLowerBound;
6
See Also:
?BipartiteGraph ?Graph_minus ?Nodes
?Clique ?Graph_Rand ?ParseDimacsGraph
?DrawGraph ?Graph_XGMML ?Path
?Edge ?InduceGraph ?RegularGraph
?EdgeComplement ?MaxCut ?ShortestPath
?Edges ?MaxEdgeWeightClique ?TetrahedronGraph
?FindConnectedComponents ?MinCut
?Graph ?MST
View
Function View - show an object on the screen in a visual way
Option: polymorphic
Calling Sequence: View(t)
Parameters:
Name Type Description
-------------------------------------------
t anything an object to be displayed
Returns:
NULL
Synopsis: This function attempts to display an object in a visual way. If
the object is an HTML file, a browser will be called. If it is a plot, a
postscript viewer will be called, if it is a Latex file, the xdvi viewer
will be called. This function is very system dependent, it works only in
unix/linux, and assumes that the underlying programs are available.
Examples:
> View(Histogram(data));
> View(HTML(doc));
See Also:
?Block ?Document ?latex ?Roman
?Code ?HTML ?List ?RunDarwinSession
?Color ?HyperLink ?Paragraph ?screenwidth
?Copyright ?Indent ?PostscriptFigure ?Table
?DocEl ?LastUpdatedBy ?print ?TT
ViewPlot
Function ViewPlot - run a viewer on a plot just created
Calling Sequence: ViewPlot()
Returns:
NULL
Synopsis: Start ghostview showing the current output of DrawPlot() or most
other plotting commands, in the proper orientation.
See Also:
?BrightenColor ?DrawPlot ?PlotArguments
?ColorPalette ?DrawPointDistribution ?Set
?DrawDistribution ?DrawStackedBar ?SmoothData
?DrawDotplot ?DrawTree ?StartOverlayPlot
?DrawGraph ?GetColorMap ?StopOverlayPlot
?DrawHistogram ?Plot2Gif
VisualizeProtein
Function VisualizeProtein( ms:list(NucPepMatch) )
Visualize the alignment of a protein with all its homologue genes.
WeightObservations
Function WeightObservations - Weight data for least squares analysis
Calling Sequence: WeightObservations(A,b,w)
Parameters:
Name Type Description
---------------------------------------------------------------
A matrix(numeric) n rows of m-dimensional data vectors
b array(numeric) n-dimensional vector of dependent data
w array(numeric) n-dimensional vector of weights
Returns:
[ AtA:matrix(numeric), btA:array(numeric), btb:numeric ]
Synopsis: Prepare matrices and vectors used for least squares approximations
with given weights. Given the matrix A (dim n x m) and the vector b (dim n)
a least squares solution searches a vector x, such that Ax ~ b, or |Ax-b| is
minimal in some sense. A weighted least squares problem is equivalent to
the above, except that every error is weighted by a (non-negative) factor
w[i]. This is equivalent to minimizing | W*(Ax-b) | (where W is a diagonal
matrix of weights). In simpler terms, if a weight w[i] is an integer, then
considering the weight is equivalent to having w[i] equal observations of
the data point i. Setting a weight to 0 is equivalent to deleting the
observation. WeightObservations prepares the matrix AtA = A^t * A, btA =
b^t * A and btb = b^t * b with the given weights. Usually, least squares
approximating functions require these as input (SvdAnalysis, SvdBestBasis,
etc.)
Examples:
> A := [[1,2],[3,3],[4,7],[6,2]];
A := [[1, 2], [3, 3], [4, 7], [6, 2]]
> WeightObservations(A,[1,1,2,2],[10,5,2,0]);
[[[87, 121], [121, 183]], [41, 63], 23]
See also: ?SvdAnalysis ?SvdBestBasis
WriteBlock
Function WriteBlock( ali:array(string) )
Write a sequence alignment in Block format.
Used namely by Geoff Barton's program alscript.
WriteData
Function WriteData - write data to a file
Calling Sequence: WriteData(data,filename,separator)
Parameters:
Name Type Description
-------------------------------------------------
data anything data to be saved
filename string name of file to be written
separator string string used as separator
Returns:
NULL
Synopsis: WriteData function writes data to a file in a simple format.
Useful for exporting data to other applications. The filename defaults to
temp.dat and the separator is by default the tab character.
See also: ?FileStat ?LockFile ?OpenWriting ?WriteFasta ?WriteSeqXML
WriteFasta
Function WriteFasta
Calling Sequence: WriteFasta(seq)
WriteFasta(seq,labs,fname)
Parameters:
Name Type
---------------------
seq array(string)
labs array(string)
fname filename
Returns:
NULL
Synopsis: Writes an array of sequences to a file (default is temp.fasta). If
no labes are given, the sequences are numbered according to the order.
Examples:
> WriteFasta(['ACCGTA', 'AC_GTA']);
>1
ACCGTA
>2
AC_GTA
See also: ?OpenWriting ?WriteData ?WriteSeqXML
WriteSeqXML
Function WriteSeqXML - Writes a genome database into a SeqXML formatted file.
Calling Sequence: WriteSeqXML(f)
Parameters:
Name Type Description
-----------------------------------------------------------------------------
f string path to output file
db {database,string} (optional) path to database file / database handle
Returns:
NULL
Global Variables: DB
Synopsis: The function WriteSeqXML stores a genome database in SeqXML format.
If no 'db' argument is passed the database currently assigned to DB is used.
See also: ?WriteFasta
Zeta
Function Zeta
Calling Sequence: Zeta(s)
Parameters:
Name Type
--------------
s numeric
Returns:
numeric
Synopsis: This function computes the Riemann Zeta function defined by
inifinity
-----
\ 1
Zeta(s) = ) ----
/ s
----- i
i = 1
Zeta has a simple pole at s=1. For all other values it is defined as the
complex-plane extension of the above sum.
References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun,
Ch 23.2
Examples:
> Zeta(2);
1.6449
> Zeta(3);
1.2021
> Zeta(-0.5);
-0.2079
Zscore
Function Zscore - Test a statistical hypothesis
Calling Sequence: Zscore(data)
ZscorePercent(data)
Parameters:
Name Type Description
------------------------------------------------------------
data list Counts of observations, assumed equiprobable
data matrix Counts by two criteria, assumed independent
Returns:
{list,matrix}
Synopsis: Zscore transforms a vector or matrix of counts into a vector/matrix
of normalized variables (ones with expected value 0 and variance 1). This
is subtracting the expected value and dividing by the standard deviation.
Or Z = (X-E[X])/sqrt(Var(X)).
In this way the observations can be measured in "standard deviations away from
the mean", which is a simple and useful measure. This is sometimes called
the Z-transform, but since the Z-transform has a well established use in
power series, we use the name Zscore.
If the input is a vector of integers, it is assumed that all the values are
counts of events which are equally probable. If the input is a matrix it is
assumed that the values are counts of two independent events (columns/rows).
In both cases, a binomial distribution is assumed for the counts, i.e. the
individual events counted are independent of each other. ZscorePercent is
very similar, but instead of returning a normalized variable, it returns a
percentage of the expected value, i.e. Z = 100 * (X-E[X])/E[X]
Examples:
> Zscore( [8,12,21,7] );
[-1.3333, 0, 3, -1.6667]
> print(Zscore( [[3,7,21],[10,15,33]] ));
-0.73710648 -0.25050450 0.56887407
0.55192433 0.19114995 -0.47500296
> ZscorePercent( [8,12,21,7] );
[-33.3333, 0, 75, -41.6667]
See Also:
?Cumulative ?ProbBallsBoxes ?StatTest
?CumulativeStd ?ProbCloseMatches ?Std_Score
?OutsideBounds ?Rand ?TestStatResult
abs
Function abs - absolute value
Options: builtin, numeric, polymorphic and zippable
Calling Sequence: |x|
abs(x)
Parameters:
Name Type Description
------------------------------
x numeric an expression
Returns:
numeric
Synopsis: This function computes the absolute value of a number. Two
syntaxes are available, the functional one, abs(x) or the mathematical with
vertical bars: |x|. Please note that |x|, when x is an array is not the
norm of the vector, but the vector of the absolute values.
Examples:
> |-0.3|;
0.3000
> abs(cos(3));
0.9900
> |[-1,-2,-3]|;
[1, 2, 3]
antiparallel
Function antiparallel - reverse complement of a DNA sequence
Option: builtin
Calling Sequence: antiparallel(seq)
Parameters:
Name Type Description
----------------------------------
seq string a DNA/RNA sequence
Returns:
string
Synopsis: Computes the antiparallel sequence of an DNA/RNA sequence. This is
the complement in reverse order. For more clarity, the antiparallel of AACC
is GGTT. The reverse of AACC is CCAA and the Complement of AACC is TTGG.
The antiparallel of a DNA sequence describes a molecule that would form a
double helix with the sequence.
Examples:
> antiparallel('ACCUUC');
GAAGGU
See Also:
?AltGenCode ?BBBToInt ?CIntToCodon ?Complement ?IntToCInt
?AminoToInt ?BToInt ?CIntToInt ?GeneticCode ?IntToCodon
?AToCInt ?CIntToA ?CodonToA ?IntToB ?Reverse
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase
?BaseToInt ?CIntToAmino ?CodonToInt ?IntToBBB
append
Function append - append to a list, set or structure
Option: builtin
Calling Sequence: append(L,e_1..e_k)
Parameters:
Name Type
-------------------------------
L a list, set or structure
e_i an arbitrary element
Returns:
{list,set,structure}
Synopsis: This function appends e_1..e_k to the list or structure L. If the
original list or set or structure has length less than 10, it appends on a
new copy of L. Otherwise it appends it to L and hence (likely) modifies the
original object. So if the first argument of append should not be
destroyed, the appending should be done on a copy of L. Appending is
written in a way that is efficient, even in case of appending thousands of
elements, one at a time, to an empty list. Appending to sets, although
efficient from the data enlargement point of view, is not efficient as every
new set is reordered. If a large set is to be built by appending one
element at a time, it is much more efficient to use a list and convert the
list to a set once the appending is finished. This function accepts a
variable number of additional arguments.
Examples:
> append( ABC(1,2,3), 4, 5 );
ABC(1,2,3,4,5)
> append( CreateArray(1..11,7), 77 );
[7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 77]
arcsin
Function arcsin - the inverse trigonometric function
Options: builtin, numeric, polymorphic and zippable
Calling Sequence: arcsin(x)
Parameters:
Name Type Description
--------------------------------------------
x numeric a numerical value, |x| <= 1
Returns:
numeric
Synopsis: This function computes the inverse of the trigonometric sine
function. For all -1 <= x <= 1, sin(arcsin(x))=x. For all -Pi/2 <= y <=
Pi/2, arcsin(sin(y))=y. The value returned by arcsin is a principal value,
it is between (-Pi/2 and Pi/2).
References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun,
Ch 4.4
Examples:
> arcsin(0);
0
> arcsin(1/2);
0.5236
> arcsin(1);
1.5708
> arcsin(-1);
-1.5708
See also: ?arctan ?cos ?sin ?tan
arctan
Function arctan - the inverse trigonometric function
Options: builtin, numeric and polymorphic
Calling Sequence: arctan(y)
arctan(y,x)
Parameters:
Name Type Description
--------------------------------------------
y numeric a numerical value
x numeric an optional numerical value
Returns:
numeric
Synopsis: This function, with a single argument, computes the inverse tangent
function defined by: tan(arctan(y)) = y. The value returned by arctan is
between -Pi/2 <= arctan(y) <= Pi/2. With two arguments, it computes the
inverse tangent function defined by: tan(arctan(y,x)) = y/x when x <> 0.
The value returned by arctan with two arguments is between -Pi < arctan(y,x)
<= Pi. Arctan with two arguments computes the principal value of the
argument of the complex number x+I*y
References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun,
Ch 4.4
Examples:
> arctan(0);
0
> arctan(1);
0.7854
> arctan(1,0);
1.5708
> arctan(-1,0);
-1.5708
> arctan(1,1);
0.7854
> arctan(-1,-1);
-2.3562
See also: ?arcsin ?cos ?sin ?tan
assemble
Function assemble - creates an internal structure
Option: builtin
Calling Sequence: assemble(s)
Parameters:
Name Type Description
---------------------------------------------
s structure a structure of valid types
Returns:
anything : an arbitrary Darwin structure
Synopsis: Assemble and disassemble are a pair of functions which allow the
handling of procedures and expressions in Darwin. Disassemble transforms an
internal structure into a Darwin data structure, where the names of the
classes are the type names of the components. Assemble does exactly the
reverse. The existence of this pair of functions is to be able to inspect,
modify and create new bodies of procedures. Although they both work for any
structure, common structures can be manipulated directly. It is the body of
procedures which cannot be manipulated directly without dis/assemble.
Examples:
> assemble(power(a,2));
a^2
> assemble(list(expseq(1,2)));
[1, 2]
See Also:
?disassemble (the reverse operation) ?size
?length ?type (with a single argument)
assert
Function assert - test that an assertion is true
Option: builtin
Calling Sequence: assert(cond)
Parameters:
Name Type Description
-----------------------------------------
cond boolean a condition to be tested
Returns:
NULL
Synopsis: This function evaluates its argument, which is expected to be true
or false. If it evaluates to true, it does nothing. If it evaluates to
false it produces an "assertion failed" error. The first argument of the
error is the unevaluated expression that evaluated to false. It is the easy
to write assertions which upon failure will automatically produce meaningful
errors.
Examples:
> assert(1=2);;
Error, 1 = 2, assertion failed
> Probab := 1.001;
Probab := 1.0010
> e := [ traperror(assert( Probab >=0 and Probab <= 1))];
e := [0 <= Probab and Probab <= 1, assertion failed]
> length(e);
2
> e[1];
0 <= Probab and Probab <= 1
See also: ?error ?lasterror ?traperror ?warning
assign
Function assign - assign a variable as a function call
Calling Sequence: assign(a,v)
Parameters:
Name Type
---------------
a name
v anything
Returns:
NULL
Synopsis: This function assigns the value v to the name a. The assign
function ignores the built-in scoping rules. Therefore, an assign call
inside of a procedure persists after it is finished executing. A variable
name can not be assigned a value from within a procedure if a global
variable of the same name has already been assigned a value.
Examples:
> z := proc() assign(t, 100); end:
> z();
> t;
100
See also: ?assigned ?eval ?names ?parse ?symbol
assigned
Function assigned - check if a name is assigned
Option: builtin
Calling Sequence: assigned(a)
Parameters:
Name Type
-----------
a name
Returns:
boolean
Synopsis: This function tests whether name a has been assigned a value or
symbol. It should not be used for tables, as the table is not a name and
unassigned entries evaluate to the default value. For tables, testing
should be done against the default value.
Examples:
> a:=5;
a := 5
> assigned(a);
true
> b:=c;
b := c
> assigned(b);
true
atoi
Function atoi - convert characters to integers
Calling Sequence: atoi(t)
Parameters:
Name Type
-------------
t string
Returns:
integer
Synopsis: The parameter t should be a string value formed over the symbols
0..9 and the period symbol (.). This function returns an integer value
equal to trunc(tt) where tt is the integer value of t.
Examples:
> atoi('3993');
3993
> atoi('-3.9');
-3
> type(");
integer
See also: ?sprintf ?trunc
avg
Function avg - average of numbers or list of numbers
Calling Sequence: avg(L1,L2,...)
Parameters:
Name Type Description
------------------------------------------------------------
Li {numeric,list(numeric)} a number or list of numbers
Returns:
numeric
Synopsis: Finds the average of all the values in the arguments.
Examples:
> avg(5, 97, 22, [14,15,16] );
28.1667
> avg(2,3,5,7,11,13,17,19);
9.6250
See also: ?max ?median ?min ?std ?var
ceil
Function ceil
Options: builtin, numeric and zippable
Calling Sequence: ceil(x)
Parameters:
Name Type
--------------
x numeric
Returns:
integer
Synopsis: ceil returns the smallest integer larger or equal to x. ceil(x) =
-floor(-x) for all values of x.
Examples:
> ceil(-2);
-2
> ceil(-1.99999);
-1
> ceil(2.000001);
3
See also: ?floor ?iquo ?mod ?round ?trunc
coeff
Function coeff
Calling Sequence: coeff(s,v)
Parameters:
Name Type
-------------------------------
s an arithmetic expression
v a symbol name
Returns:
algebraic : the coefficient multiplying v in s
Synopsis: Coeff computes the linear coefficient in the variable v contained
in the algebraic expression s. The algebraic expression s may be any
mathematical expression which is not yet evaluated (in symbolic form, see
noeval).
Examples:
> t1 := noeval(3*a+b*c);
t1 := 3*a+b*c
> coeff(t1,a);
3
> coeff(t1,c);
b
See also: ?has ?hastype ?indets ?lcoeff ?mselect ?noeval ?subs ?types
compress
Function compress - compress an arbitrary object
Option: builtin
Calling Sequence: compress(obj)
Parameters:
Name Type Description
------------------------------------
obj anything object to compress
Returns:
compressed
Synopsis: Compress takes any structure and compresses it in a simple way.
The function decompress restores the original expression. Normally this is
used in cases that lots of structures are stored in main memory and this
would require too much memory and the structures are not used often enough,
so that it pays to decompress them before using them. There are several
internal structures which are not compressed, most notably Dayhoff matrices
and databases. Consequently, structures that reference these will (e.g.
Alignment) will not be compressed. The compression factor is about 3:1 for
general structures on a 32-bit word implementation, higher for 64-bit words.
Examples:
> t := compress([1,2,{3,4}]);
t := [1, 2, {3,4}]
> decompress(t);
[1, 2, {3,4}]
> size(t)/size([1,2,{3,4}]);
0.1818
See also: ?decompress ?length ?size ?system
convolve
Function convolve - convolution of two or more vectors
Calling Sequence: convolve(v1,v2,...)
Parameters:
Name Type Description
----------------------------------------------------------------
v_i list(numeric) a numerical vector of arbitrary dimension
Returns:
list(numeric)
Synopsis: Compute the convolution of two or more numerical vectors. The
convolution of two vectors v1 and v2 of dimensions d1 and d2 is the vector r
with dimension d1+d2-1 with elements r[k] = sum( v1[i] * v2[k+i-i], i=1..k-1
) (references outside v1 or v2 are considered 0). The convolution of more
than two vectors is computed in an optimal order. Convolution is
associative and commutative, so order of the operation does not matter.
Examples:
> v1:=[1,2,3,4];
v1 := [1, 2, 3, 4]
> v2:=[1,1/2,1/3];
v2 := [1, 0.5000, 0.3333]
> convolve(v1,v2);
[1, 2.5000, 4.3333, 6.1667, 3, 1.3333]
See Also:
?Cholesky ?GivensElim ?matrix
?Eigenvalues ?Identity ?matrix_inverse
?GaussElim ?LinearProgramming ?transpose
copy
Function copy - copy a modifiable data structure (at a desired depth)
Option: builtin
Calling Sequence: copy(x)
copy(x,depth)
Parameters:
Name Type Description
-----------------------------------------------
x anything the structure/object to copy
depth posint optional depth of copying
Returns:
type(x)
Synopsis: This function returns an exact copy of any object it is passed.
This makes sense when we copy a modifiable object (strings, data structures,
lists, etc.) which we want to modify and we want to preserve the original in
its unmodified state. With a second argument, the copying will happen only
for the given number of levels. Without a second argument is like using
copy(x,infinity).
Examples:
> a := [1,2,[3,4]];
a := [1, 2, [3, 4]]
> a1 := copy(a,1);
a1 := [1, 2, [3, 4]]
> a2 := copy(a);
a2 := [1, 2, [3, 4]]
> a1[1] := 5;
a1[1] := 5
> a1[3,1] := 77;
a1[3,1] := 77
> a;
[1, 2, [77, 4]]
> a1;
[5, 2, [77, 4]]
> a2;
[1, 2, [3, 4]]
cor
Function cor - an unbiased correlation estimate
Calling Sequence: cor(x)
cor(x,y,method)
Parameters:
Name Type Description
------------------------------------------------------------
x {list,matrix} a numeric matrix or list
y {list,matrix} (optional) a numeric matrix or list
method string (optional) choice of coefficient
Returns:
{numeric,matrix(numeric)}
Synopsis: This function computes the correlation of 'x' and 'y' if these are
lists. If 'x' and 'y' are a matrix, the correlations between the columns of
'x' and the columns of 'y' are computed. The default of y, (i.e. 'y=NULL')
is equivalent to 'y=x', but more efficient. The optional string argument
'method' indicates which correlation coefficient is computed.
Available correlation coefficients:
pearson correlation coefficient
pearson pearson correlation coefficient
spearman spearman's rank correlation coefficient
kendall kendall's tau correlation coefficient
If method is 'kendall' or 'spearman', Kendall's tau or Spearman's rho
statistic is used to estimate a rank-based measure of association. These
are more robust and have been recommended if the data do not necessarily
come from a bivariate normal distribution. Note that 'spearman' basically
computes 'cor(R(x), R(y))' where 'R(u) := Rank(u)'
Examples:
> cor([1,5,8,4], [6,2,8,9]);
0.1306
> cor([[1,4],[2,4],[2,2],[6,1],[7,-5]], 'spearman');
[[1, -0.9211], [-0.9211, 1]]
See also: ?avg ?Covariance ?Rank ?StatTest ?std ?sum ?var
cos
Function cos - the trigonometric function
Options: builtin, numeric, polymorphic and zippable
Calling Sequence: cos(x)
Parameters:
Name Type Description
----------------------------
x numeric a number
Returns:
numeric
Synopsis: This function computes the trigonometric cosine function. cos(x)
has simple zeros at at x=Pi/2+n*Pi.
References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun,
Ch 4.3
Examples:
> cos(0);
1
> cos(Pi/4);
0.7071
> cos(Pi/2);
6.1232e-17
> cos(-Pi);
-1
See also: ?arcsin ?arctan ?sin ?tan
dSplitGraph
Function dSplitGraph( splits:list([numeric, set]), all:{posint,set} )
Computes a graph from a list of dSplits. Edges will have labels of the
format [length, splitnr] where splitnr is an index into splits which
corresponds to this edge. The procedure returns an expression sequence
g: Graph, angles: array(length(splits),numeric) where angles contains a
list of angles to be used as hints when drawing the edges of graph.
all is the set of all taxa of the split or a posint if the set is 1..all.
dSplitIndex
Function dSplitIndex( d:matrix(numeric), splits:list([numeric, set]) )
Computes the splittable fraction rho.
dSplitMetricSum
Function dSplitMetricSum( splits:list([numeric, set]), n:posint )
Computes the split decomposable distances d1. n is the number of taxa.
dSplits
Function dSplits( d:matrix(numeric) )
Computes the d-splits and their isolation indices from the distance matrix.
Returns list([index,set]).
date
Function date
Option: builtin
Calling Sequence: date()
Returns:
string
Synopsis: Returns the current date and time as a string.
See also: ?time ?UTCTime
debug
Function debug
Option: builtin
Calling Sequence: debug()
debug(arg)
Parameters:
Name Type
-----------------------------------
arg an optional symbol or string
Returns:
NULL
Synopsis: This function starts or stops the Darwin interactive debugger. The
argument is optional. Without an argument, a call to debug() will start the
interactive debugger. A call such as debug(false) will stop the debugger.
If the argument is a string, this will be understood as a device which
should be used to get the user's input (instead of stdin). E.g. if stdin is
used for other purposes, then debug( '/dev/tty' ) will force the debugger to
use /dev/tty for user input. At each interaction, the user can enter
commands to inspect, alter variables and continue the debugging process.
The interactive debugger is called whenever:
an assignment statement has been executed
an expression statement has been executed
an if boolean expression has been evaluated
The interactive debugger is activated by:
a Darwin level call by the function debug() or debug(true)
an interrupt ()
an error (when the option -de is used)
The interactive debugger polls the user and takes the following actions
depending on the input (when the debugger is expecting input, it will
prompt the user with a ">>")
command action
--------------------------------------------------------------------------------
, "l", continue, as currently set
"u", "k", set up to debug only at a higher level
"d", "j", set up to debug everything
"?", "h" short help
"o" quit the debugger, continue the computation
"p" print the current line
"q" quit darwin
"t" quit the debugger and computation and go to the top
"w" print the current stack and lines
Set(debug): start kernel debugging
xxx;, xxx: execute xxx as a darwin statement
Inspecting a variable may be achieved by executing a statement with just the
variable. Similarly, any expression can be computed/inspected. Changing
a variable can be done with an assignment statement.
See also: ?printlevel ?profiling ?Set
decompress
Function decompress - decompresses a compressed object
Option: builtin
Calling Sequence: decompress(compr)
Parameters:
Name Type Description
--------------------------------------
compr compressed compressed object
Returns:
anything
Synopsis: Compress takes any structure and compresses it in a simple way.
The function decompress restores the original expression. Normally this is
used in cases that lots of structures are stored in main memory and this
would require too much memory and the structures are not used often enough,
so that it pays to decompress them before using them. The compression
factor is about 3:1 for general structures on a 32-bit word implementation,
higher for 64-bit words.
Examples:
> t := compress([1,2,{3,4}]);
t := [1, 2, {3,4}]
> decompress(t);
[1, 2, {3,4}]
> size(t)/size([1,2,{3,4}]);
0.1818
See also: ?compress ?length ?size ?system
disassemble
Function disassemble - produces a data structure from an internal structure
Option: builtin
Calling Sequence: disassemble(s)
Parameters:
Name Type Description
---------------------------------------------
s anything any valid Darwin expression
Returns:
structure
Synopsis: Assemble and disassemble are a pair of functions which allow the
handling of procedures and expressions in Darwin. Disassemble transforms an
internal structure into a Darwin data structure, where the names of the
classes are the type names of the components. Assemble does exactly the
reverse. The existence of this pair of functions is to be able to inspect,
modify and create new bodies of procedures. Although they both work for any
structure, common structures can be manipulated directly. It is the body of
procedures which cannot be manipulated directly without dis/assemble.
Examples:
> disassemble(x -> sin(x));
procedure(expseq(x),expseq(),expseq(operator,arrow),expseq(),expseq(),structure(sin,Param(1)))
> disassemble([1,{2,3}]);
list(1,set(2,3))
See Also:
?assemble (the reverse operation) ?size
?length ?type (with a single argument)
dprint
Function dprint - print so that it can be read back by Darwin
Calling Sequence: dprint(e1,e2,...)
Parameters:
Name Type Description
-----------------------------
ei anything expression
Returns:
NULL
Synopsis: This function prints out any Darwin expression. Expressions are
printed so that they could be read back by Darwin. In principle a structure
dprint-ed should produce, when read back in, the same structure (except for
numerical precision). If given multiple expressions, these will be
separated by commas, so that they can be read as an expression sequence.
Dprint will use only one newline character at the end of the printing, so
large expressions may be hard to handle in some systems (will be very long
lines). Floating point numbers are printed with 5 significant digits. The
global variable NumberFormat can be assigned a format, as in the printf
function, and all numbers will be printed accordingly. Inside a printf
statement, the format "%A" achieves the same effect as dprint.
Examples:
> dprint('a b c',1/3,1e9);
'a b c',0.3333,1000000000
> printf( '%A\n', ['a b c',1/3,1e9] );
['a b c',0.3333,1000000000]
See Also:
?lprint ?printf (contains conversion patterns) ?prints ?sscanf
?print ?PrintMatrix ?sprintf
enum
Function enum - list of consecutive integers
Calling Sequence: enum(n)
enum(r)
Parameters:
Name Type
--------------
n integer
r range
Returns:
list
Synopsis: This function returns a list of numbers from 1 to n or from range
r=r_1..r_2.
Examples:
> enum(5);
[1, 2, 3, 4, 5]
> enum(4..10);
[4, 5, 6, 7, 8, 9, 10]
See also: ?seq ?zip
erf
Function erf - error function - 2/sqrt(Pi)*int( exp(-t^2), t=0..x )
Options: builtin, numeric and zippable
Calling Sequence: erf(x)
Parameters:
Name Type
--------------
x numeric
Returns:
numeric
Synopsis: This function returns the result of the following expression:
x
/
2 | 2
erf(x) = (-----) | exp(-t ) dt
1/2 |
Pi /
0
The probability of a normally distributed variable with mean m and variance
s^2 been less than x is 1/2+1/2*erf( (x-m)/sqrt(2*s^2) ).
References: Erdelyi53, Handbook of Mathematical functions, Abramowitz and
Stegun, 7.1
Examples:
> erf(0);
0
> erf(1.96/sqrt(2));
0.9500
> erf(3);
1.0000
> erf(-2);
-0.9953
See also: ?erfc ?erfcinv ?Normal_Rand
erfc
Function erfc - the complement of the error function
Options: builtin, numeric and zippable
Calling Sequence: erfc(x)
Parameters:
Name Type
--------------
x numeric
Returns:
numeric
Synopsis: This function returns the result of the following expression:
erfc(x) = 1 - erf(x)
infinity
/
2 | 2
erfc(x) = (-----) | exp(-t ) dt
1/2 |
Pi /
x
References: Erdelyi53
Examples:
> erfc(3);
2.209e-05
> erfc(1.96/sqrt(2));
0.04999579
See also: ?erf ?erfcinv
erfcinv
Function erfcinv
Options: builtin, numeric and zippable
Calling Sequence: erfcinv(x)
Parameters:
Name Type Description
----------------------------
x numeric a number
Returns:
numeric
Synopsis: This function returns the inverse of erfc(x), that is, the value
of y such that: 2/ sqrt(Pi) integral from y to infinity exp(-t^2) dt = x.
Examples:
> erfcinv(3);
Error, 3 is an invalid argument for erfcinv
> erfcinv(0.1);
1.1631
> sqrt(2)*erfcinv(0.05);
1.9600
See also: ?erf ?erfc
error
Function error - terminate execution and issue an error message
Option: builtin
Calling Sequence: error(msg,...)
Parameters:
Name Type Description
-----------------------------------------------------------
msg anything usually an error message
... anything additional arguments to clarify the error
Returns:
NULL
Synopsis: This function returns to the top level of execution and issues the
error message msg. If an error happens while executing a traperror()
function, then the flow will not return to the top level, instead the
traperror function will return with the value of the argument(s) of error().
When this happens, the global variable "lasterror" is set to the value of
the error, else it is unassigned. Using error/traperror allows for a simple
throw-catch mechanism.
Examples:
> f := proc(x) if x=0 then error('oops, div by 0') fi; 5/x end;
f := proc (x) if x = 0 then error(oops, div by 0) fi; 5/x end
> f(0);
Error, (in f) oops, div by 0
> traperror(f(0));
oops, div by 0
> lasterror;
oops, div by 0
See also: ?assert ?lasterror ?traperror ?warning
eval
Function eval
Option: builtin
Calling Sequence: eval(exp)
Parameters:
Name Type
-----------------
exp expression
Returns:
anything
Synopsis: This function forces the immediate and complete evaluation of exp.
Examples:
> eval(5+5);
10
> eval(parse('2+3!'));
8
See also: ?noeval
evalb
Function evalb
Option: builtin
Calling Sequence: evalb(exp)
Parameters:
Name Type
-----------------
exp expression
Returns:
boolean
Synopsis: This function forces an immediate evaluation of the boolean
expression exp.
Examples:
> evalb(5=5);
true
> evalb(true = (not(not(not false))));
true
See also: ?eval
exit
Function exit
Option: builtin
Calling Sequence: exit(status)
Parameters:
Name Type Description
--------------------------------------
status {0,posint} exit status code
Synopsis: The exit function causes Darwin to immediately be terminated and
the value of status is returned to the parent process. A non-zero exit code
indicates an error whereas 0 indicates a successful termination.
References: man 3 exit
Examples:
> exit(2);
See also: ?return
exp
Function exp - exponential function
Options: builtin and polymorphic
Calling Sequence: exp(x)
exp(A)
Parameters:
Name Type
--------------------------------
x a numerical value
A a square numerical matrix
Returns:
numeric
matrix(numeric)
Synopsis: This function computes the exponential e^x (e = 2.71828...) if the
parameter is a single numerical value. Otherwise, it computes e^A = I + A +
A^2/2 + A^3/6 + ... or the exponential of a square matrix. For all
numerical values of x, ln(exp(x))=x.
References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun,
Ch 4.2
Examples:
> exp(0);
1
> exp(5);
148.4132
> exp([[1, 2], [3, 4]]);
[[51.9690, 74.7366], [112.1048, 164.0738]]
See also: ?expx1 ?lg ?ln ?ln1x ?log ?log10
expx1
Function expx1 - compute exp(x)-1 accurately for small x
Calling Sequence: expx1(x)
Parameters:
Name Type
------------------------
x a numerical value
Returns:
numeric
Synopsis: This function computes the exponential e^x-1 (e = 2.71828...).
This function is intended for very small values of x when exp(x) is too
close to 1, and hence significant precision is lost. For all numerical
values of x, ln1x(expx1(x))=x.
References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun,
Ch 4.2
Examples:
> expx1(0);
0
> expx1(5e-20);
5e-20
> expx1(ln1x(7e-30));
7e-30
See also: ?exp ?lg ?ln ?ln1x ?log ?log10
factorial
Function factorial
Options: builtin, numeric and zippable
Calling Sequence: factorial(n)
Parameters:
Name Type
------------------------------------
n an integer or numerical value
Returns:
numeric
Synopsis: factorial returns the product of 1*2*3*...*n for integer values of
n. For non-integer values it returns Gamma(n+1), the complex-plane
extension of factorial.
Gamma(z+1) = z*Gamma(z).
For non-integer values it is also possible to define factorial for negative
arguments. This function can be invoked with the standard postfix notation,
that is n! or in functional form, factorial(n).
References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun,
Ch 6.1
Examples:
> 0!;
1
> 6!;
720
> factorial(-1.5);
-3.5449
See also: ?Gamma ?LnGamma
floor
Function floor
Options: builtin, numeric and zippable
Calling Sequence: floor(x)
Parameters:
Name Type
------------------------
x a numerical value
Returns:
integer
Synopsis: floor returns the largest integer less than or equal to x. floor
(x) = -ceil(-x) for all values of x.
Examples:
> floor(-2);
-2
> floor(-1.99999);
-2
> floor(2.000001);
2
See also: ?ceil ?iquo ?mod ?round ?trunc
gc
Function gc - garbage collection
Option: builtin
Calling Sequence: gc()
Returns:
NULL
Synopsis: This function forces Darwin to immediately coalesce all allocated
but not in use memory. Unless the system variable printgc is set to false,
this function prints the current number of bytes allocated and the total CPU
time used so far.
Examples:
> gc();
See also: ?Set (the gc option)
gcd
Function gcd - greatest common divisor
Calling Sequence: gcd(a1..ak)
Parameters:
Name Type
-----------------------
ai an integer value
Returns:
integer
Synopsis: Gcd computes the greatest common divisor of all the arguments
given. That is a number that exactly divides each one of the arguments.
Gcd takes a variable number of arguments, but all of them must be integers.
Examples:
> gcd(91,21);
7
> gcd(999999,142857);
142857
> gcd();
0
> gcd(20,25,-30,-40);
5
See also: ?iquo ?mod
getpid
Function getpid
Option: builtin
Calling Sequence: getpid()
Returns:
posint
Synopsis: This function returns the process identification number assigned by
the operating system to the current invocation of Darwin.
Examples:
> getpid();
25033
gigahertz
Function gigahertz - estimate the processor speed
Calling Sequence: gigahertz()
Returns:
numeric
Synopsis: This function estimates the computing power of the processor which
is running. The value has been tuned so that a Pentium III processor rated
at 750MHz gives 0.75 as a result. Hence, this is a measure equivalent to
the number of MHz of such processors. There are many many factors which
affect the efficiency of Darwin running on a particular processor, e.g.
compiler, system load, cache size, type of processor, memory speed, and many
others. So this number should be taken with extreme care. The function
executes alignments, some counting, random number generation and some linear
algebra to obtain the estimate of the time.
Examples:
> gigahertz();
4.1970
See also: ?time
has
Function has - test if a structure contains a value
Option: builtin
Calling Sequence: has(str,val)
Parameters:
Name Type Description
------------------------------------------
str anything an arbitrary structure
val anything value to be found in str
Returns:
boolean
Synopsis: The function tests whether the second argument is part of the first
argument.
Examples:
> has([1,2,3],2);
true
> has(A(1,2,3),4);
false
> has({[A(77)]},77);
true
See Also:
?coeff ?indets ?mselect ?subs
?hastype ?lcoeff ?noeval ?types
hash
Function hash - hashing value of an arbitrary expression
Option: builtin
Calling Sequence: hash(expr)
Parameters:
Name Type Description
---------------------------------------
expr anything any Darwin expression
Returns:
integer
Synopsis: The hash function returns an arbitrary integer computed from the
given expression. This hashing value is guaranteed to be the same for
identical expressions, but it is not guaranteed to be unique. That is,
there could be two different expressions which yield the same hash value.
The hash value of a string with a single character is the numerical value of
its ascii representation plus a constant. Hashing values are used
internally for the remember function, and may be used by the user for
similar purposes (detecting that two expressions are different without
actually comparing them). The hashing values are not guaranteed to be the
same across different systems, in particular they depend on the integer word
size.
Examples:
> hash([1,2]);
7881299347950511
> hash('abc');
3377699728949699
> hash(abc);
3377699728949699
> hash(a)-hash(A);
32
> hash(ASHYMY)-hash(YYYWYN);
-927712935936
See also: ?remember ?sha2 ?table
hastype
Function hastype - test if a structure contains any object of a given type
Calling Sequence: hastype(str,typ)
Parameters:
Name Type Description
-------------------------------------------
str anything an arbitrary structure
typ type a type to be found in str
Returns:
boolean
Synopsis: The function hastype tests whether the first argument contains any
value of the given type
Examples:
> hastype([1,2,3],posint);
true
> hastype(A(1,2,3),list);
false
> hastype({[A(77)]},list);
true
See also: ?coeff ?has ?indets ?lcoeff ?mselect ?noeval ?subs ?types
help
Function help
Calling Sequence: help(topic)
? topic
Parameters:
Name Type
--------------
topic string
Returns:
NULL
Global Variables: HelpIndex HelpText
Synopsis: The help and ? functions search (approximately) for the topic in
the Darwin system and print out any description lines for these routines.
The help function is case insensitive.
Users should take note that print(topic) and help(topic) have different
semantics. Firstly, no approximate search is performed with topic in the
former and secondly, the description for topic is calculated dynamically for
topic (any examples are run immediately).
Examples:
> help(Match);
. . .
> ? phylogenetic;
. . .
See also: ?print
hostname
Function hostname
Option: builtin
Calling Sequence: hostname()
Returns:
string
Synopsis: This function returns the name of the current host on which the
current session is running.
Examples:
> hostname();
linneus78
See also: ?CallSystem ?getpid
ilogb
Function ilogb
Options: builtin, numeric and zippable
Calling Sequence: ilogb(x)
Parameters:
Name Type
------------------------
x a numerical value
Returns:
integer
Synopsis: ilogb returns the exponent of the floating point representation of
x. This function is defined in the IEEE 754 floating point standard. It is
the floor of the logarithm base 2 of |x|, for |x| >= 1, computed directly
from the representation (very fast). For non-zero arguments, and for IEEE
base 2 floating point numbers, 1 <= |x| / 2^ilogb(x) < 2.
Examples:
> ilogb(2);
1
> ilogb(1.0e-307);
-1020
> ilogb(0);
-2098
> ilogb(Pi);
1
See also: ?lg ?scalb
indets
Function indets - return all subexpressions of a given type
Calling Sequence: indets(str,typ)
Parameters:
Name Type Description
---------------------------------------------------------
str anything an arbitrary structure
typ type (optional) a type to be searched in str
Returns:
set(typ)
Synopsis: The function indets returns a set with all the subexpressions in
str which are of type typ. If the type typ is omitted, it is assumed to be
"symbol".
Examples:
> indets([1,-2,3.1,abc],posint);
{1}
> indets(A(1,[77],[[]]),list);
{[],[77],[[]]}
> t := Tree(Tree(Leaf(A),5,Leaf(B)),0,Tree(Leaf(C),11,Leaf(D)));
t := Tree(Tree(Leaf(A),5,Leaf(B)),0,Tree(Leaf(C),11,Leaf(D)))
> indets(t,Leaf);
{Leaf(A),Leaf(B),Leaf(C),Leaf(D)}
> indets(t);
{A,B,C,D}
See also: ?coeff ?has ?hastype ?lcoeff ?mselect ?noeval ?subs ?types
intersect
Function intersect
Options: builtin and polymorphic
Calling Sequence: a intersect b
intersect(a,b)
Parameters:
Name Type
-----------
a set
b set
Returns:
set
Synopsis: Computes the intersection of two sets, that is a set which has all
the elements both in a and b. The value intersect() is understood to be the
entire universe, and hence intersections including intersect() will simply
return the other argument. In its functional form, any arbitrary number of
sets can be intersected. In particular, intersect(a) = a.
Examples:
> {1,2,3} intersect {2,3,4};
{2,3}
> {1,2,3} intersect {};
{}
> {1,2,3} intersect intersect();
{1,2,3}
See also: ?member ?minus ?subset ?union
invlogit
Function invlogit( l:numeric )
Convert 10 log10(p) to log10(p/(1-p)).
iquo
Function iquo
Option: polymorphic
Calling Sequence: iquo(a,b)
Parameters:
Name Type
--------------
a integer
b integer
Returns:
integer
Synopsis: iquo returns the integer quotient between a and b. If b=0, a
division by zero fault is generated. The result is truncated towards zero
for both positive and negative results. Formally, iquo(a,b) = trunc(a/b).
Examples:
> iquo(7,3);
2
> iquo(-3,2);
-1
> iquo(121,11);
11
See also: ?ceil ?floor ?mod ?round ?trunc
islower
Function islower( c:string )
Returns true if c is lower case, else returns false
isupper
Function isupper( c:string )
Returns true if c is upper case, else returns false
iterate
Function iterate - make available one value for an iterator
Option: builtin
Calling Sequence: iterate(v)
Parameters:
Name Type Description
------------------------------------------------------------
v anything a value that will be used by a for-in loop
Returns:
NULL
Synopsis: iterate is used inside an iterator function to feed a value to the
calling for-in loop. The argument(s) of iterate are evaluated, and the for
loop variable is assigned this value, and another iteration is performed.
The body of the for loop is executed by the call to iterate.
See Also:
?Entries ?iterator ?Lines ?Postfix ?Primes
?Infix ?Leaves ?objectorientation ?Prefix ?Sequences
json
Function json - serialize darwin structure as json compatible string
Calling Sequence: json(obj)
Parameters:
Name Type Description
-----------------------------------------
obj anything object to be serialized
Returns:
string
Synopsis: This function serializes any darwin object into a json formated
string. Darwin objects are encoded as objects with a '_darwinType' and a
'data' field.
References: http://www.json.org
Examples:
> json( [1,2,'blue']);
[1,2,"blue"]
> json(Complex(5,2));
{"_darwinType":"Complex","data":[5,2]}
See also: ?OpenWriting ?WriteSeqXML
latex
Function latex - convert a document or part of it to latex
Option: polymorphic
Calling Sequence: latex(a,titl,auth)
LaTeX(a,titl,auth)
LaTeXC(a)
Parameters:
Name Type Description
----------------------------------------------------------------
a {string,structure} object to convert to latex
titl string (optional) title of the document
auth string (optional) author(s) of the document
Returns:
string
Synopsis: The latex function converts an object, typically a Document or a
part thereof, to latex. LaTeX is a synonym of latex, much more difficult to
type but according to Leslie Lamport. LaTeXC is used for a component, that
is no headers/trailers will be produced.
Examples:
> t := Table( center, border, Row('abc','cde')):
> prints(LaTeXC(t));
\begin{table}[!ht]
\begin{center}
\begin{tabular}{|c|c|}
\hline
abc & cde\\
\hline
\end{tabular}
\end{center}
\end{table}
> d := Document('Species evolve, that''s it.'):
> prints(latex(d,'The origin of species','Charles Darwin'));
% automatically generated by Darwin
% prepared on Tue Feb 19 10:54:59 2013
% running on linneus78
% by user darwin
\documentclass{article}
\usepackage{html,color,epsfig}
\setlength{\parindent}{5pt}
\begin{document}
\title{The origin of species}
\author{Charles Darwin}
\maketitle
Species evolve, that's it.
\end{document}
See Also:
?Block ?Document ?List ?RunDarwinSession
?Code ?HTML ?Paragraph ?screenwidth
?Color ?HyperLink ?PostscriptFigure ?Table
?Copyright ?Indent ?print ?TT
?DocEl ?LastUpdatedBy ?Roman ?View
lcoeff
Function lcoeff - leading coefficient
Calling Sequence: lcoeff(s)
Parameters:
Name Type
-------------------------------
s an arithmetic expression
Returns:
algebraic : the leading coefficient in s
Synopsis: lcoeff computes the leading numerical coefficient contained in the
algebraic expression s. The algebraic expression s may be any mathematical
expression which is not yet evaluated (in symbolic form, see noeval). In
case of a sum, the leading coefficient is extracted from the first
(positional) coefficient.
Examples:
> t1 := noeval(3*a+b*c);
t1 := 3*a+b*c
> lcoeff(t1);
3
See also: ?coeff ?has ?hastype ?indets ?mselect ?noeval ?subs ?types
length
Function length - length of an object
Option: builtin
Calling Sequence: length(obj)
Parameters:
Name Type Description
--------------------------------------------
obj {array,list,set,string} any object
Returns:
{0,posint}
Synopsis: Returns the length of the given object obj.
Examples:
> length('');
0
> length({1,2,{a,b,c}});
3
> length([1,2,3,4]);
4
> length('length');
6
See also: ?assemble ?Class ?CreateArray ?disassemble ?size
lg
Function lg
Calling Sequence: lg(x)
Parameters:
Name Type
-------------------------------------------
x a positive number or a square matrix
Returns:
{numeric,matrix(numeric)}
Synopsis: lg computes the logarithm base 2 or a number or a square matrix.
For all arguments it is true that lg(2^x) = x. For positive arguments or
for matrices for which the logarithm can be computed, it is always true that
2^lg(x) = x.
Examples:
> lg(7.5);
2.9069
> lg(16);
4
> lg( [[2,1],[0,3]]);
[[1, 0.5850], [0, 1.5850]]
> 2^lg( [[2,1],[0,3]]);
[[2.0000, 1.0000], [0, 3.0000]]
See also: ?exp ?ilogb ?ln ?ln1x ?log ?log10
ln
Function ln
Options: builtin and polymorphic
Calling Sequence: ln(x)
ln(A)
Parameters:
Name Type
--------------------------------
x a numerical value > 0
A a square numerical matrix
Returns:
numeric
matrix(numeric)
Synopsis: This function computes the logarithm base e (e = 2.71828...) if the
parameter is a single numerical value. This is usually called the natural
logarithm. If the argument is a square matrix, it computes a square matrix
B with the same dimensions as A such that e^B=A, or the natural logarithm of
a square matrix. Not all matrices have a logarithm (which is real-valued).
For all numerical values of x, ln(exp(x))=x.
References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun,
Ch 4.1
Examples:
> ln(1);
0
> ln(5);
1.6094
> ln([[2, 1], [3, 4]]);
[[0.4024, 0.4024], [1.2071, 1.2071]]
See also: ?exp ?expx1 ?ilogb ?lg ?ln1x ?log ?log10
ln1x
Function ln1x - compute ln(1+x) accurately for small x
Calling Sequence: ln1x(x)
Parameters:
Name Type
-----------------------------
x a numerical value > -1
Returns:
numeric
Synopsis: This function computes the logarithm base e (e = 2.71828...) of
1+x. This is necessary when the value of x is very small, and computing 1+x
would produce a significant truncation. A typical such computation is when
1 - (1-eps)^n has to be computed, and eps is very small and n is very large.
This can be done accurately with -expx1(n*ln1x(-eps)).
References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun,
Ch 4.1
Examples:
> ln1x(0.001)-ln(1.001);
1.0994e-16
> ln1x(1e-60);
1e-60
See also: ?exp ?expx1 ?ilogb ?lg ?log ?log10
lnProbBallsBoxes
Function lnProbBallsBoxes - probability of hitting k eps-boxes with n balls
Calling Sequence: lnProbBallsBoxes(k,n,eps)
Parameters:
Name Type Description
----------------------------------------------------------
k posint number of boxes
n posint number of balls randomly thrown in [0,1]
eps positive 0 lnProbBallsBoxes(3,10,0.0001);
-21.0528
See Also:
?Cumulative ?DigestWeights ?OutsideBounds ?StatTest
?DigestAspN ?DynProgMass ?ProbBallsBoxes ?Std_Score
?DigestionWeights ?DynProgMassDb ?ProbCloseMatches
?DigestSeq ?enzymes ?SearchMassDb
?DigestTrypsin ?MassProfileResults ?Stat
log
Function log
Options: builtin and polymorphic
Calling Sequence: log(x)
log(A)
Parameters:
Name Type
--------------------------------
x a numerical value > 0
A a square numerical matrix
Returns:
numeric
matrix(numeric)
Synopsis: This function computes the logarithm base e (e = 2.71828...) if the
parameter is a single numerical value. This is usually called the natural
logarithm. If the argument is a square matrix, it computes a square matrix
B with the same dimensions as A such that e^B=A, or the natural logarithm of
a square matrix. Not all matrices have a logarithm (which is real-valued).
For all numerical values of x, log(exp(x))=x. Log is an alias for ln.
References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun,
Ch 4.1
Examples:
> log(1);
0
> log(5);
1.6094
> log([[2, 1], [3, 4]]);
[[0.4024, 0.4024], [1.2071, 1.2071]]
See also: ?exp ?expx1 ?ilogb ?lg ?ln ?ln1x ?log10
log10
Function log10
Calling Sequence: log10(x)
log10(A)
Parameters:
Name Type Description
----------------------------------------
x numeric numeric > 0
A matrix a square numeric matrix
Returns:
numeric
matrix(numeric)
Synopsis: This function computes the logarithm (base 10) if the parameter is
a single numerical value. If the argument is a square matrix, it computes a
square matrix B with the same dimensions of A such that 10^B = A. Not all
matrices have a logarithm base 10 (which is real valued).
Examples:
> log10(7.5);
0.8751
> log10(10);
1
See also: ?lg ?ln ?ln1x ?log
logit
Function logit( L:numeric )
Convert log10(p/(1-p)) to 10 log10(p).
lowercase
Function lowercase
Option: builtin
Calling Sequence: lowercase(t)
Parameters:
Name Type
-------------
t string
Returns:
string
Synopsis: The string t is converted to lowercase letters.
Examples:
> lowercase('Not NEARLY SO BoLD');
not nearly so bold
See also: ?uppercase
lprint
Function lprint - linear print of expression(s)
Option: builtin
Calling Sequence: lprint(e1,e2,...)
Parameters:
Name Type Description
-----------------------------
ei anything expression
Returns:
NULL
Synopsis: This function prints out any Darwin built-in type or structured
type. If the expression is too long, newline characters will be inserted in
a semi-intelligent way. Multiple expressions are separated by a single
space. Floating point numbers are printed with 5 significant digits. The
global variable NumberFormat can be assigned a format, as in the printf
function, and all numbers will be printed accordingly. lprint is intended
to provide a safe and quick way of printing expressions. In general, it is
not possible to read them back into Darwin, use dprint for Darwin-readable
output.
Examples:
> x:= [[1,2],[3,4]]:
> lprint('A linear printing of a square matrix:', x);
A linear printing of a square matrix: [[1, 2], [3, 4]]
See Also:
?dprint ?printf (contains conversion patterns) ?prints ?sscanf
?print ?PrintMatrix ?sprintf
matrix_inverse
Function matrix_inverse - invert a square matrix
Option: builtin
Calling Sequence: matrix_inverse(A)
Parameters:
Name Type Description
--------------------------------------------------------
A matrix a matrix for which the inverse is wanted
Returns:
matrix
Synopsis: Compute the inverse of a square matrix. If A is a square matrix
the same effect is obtained by computing A^(-1). To resolve a system of
linear equations, GaussElim(A,b) is more efficient than A^(-1) * b.
Examples:
> A := [[3,1,2],[1,2,-1],[2,-1,5]];
A := [[3, 1, 2], [1, 2, -1], [2, -1, 5]]
> A^(-1);
[[0.9000, -0.7000, -0.5000], [-0.7000, 1.1000, 0.5000], [-0.5000, 0.5000, 0.5000]]
See Also:
?Cholesky ?Eigenvalues ?GivensElim ?LinearProgramming ?transpose
?convolve ?GaussElim ?Identity ?matrix
max
Function max - maximum of numbers or list of numbers
Options: builtin and numeric
Calling Sequence: max(L1,L2,...)
Parameters:
Name Type Description
------------------------------------------------------------------------------------------
Li {numeric,list(numeric),list(list(numeric))} numbers or list (of lists) of numbers
Returns:
numeric
Synopsis: Finds the maximum valued element in L if L is simply a list of
numeric elements. If L is a list of lists of numeric, the function
effectively flattens this list to a simple list and returns the maximum
valued element.
Examples:
> max(5, 97, 22, [14,15,16] );
97
> max(2,3,5,7,11,13,17,19);
19
See also: ?avg ?min ?std ?var
median
Function median - median of numbers or list of numbers
Calling Sequence: median(L1,L2,...)
Parameters:
Name Type Description
------------------------------------------------------------
Li {numeric,list(numeric)} a number or list of numbers
Returns:
numeric
Synopsis: Finds the median of all the values in the arguments.
Examples:
> median(5, 97, 22 );
22
> median(2,3,5,7,11,13,17,19);
9
See also: ?avg ?max ?min ?std ?var
member
Function member
Option: builtin
Calling Sequence: member(a,b)
Parameters:
Name Type Description
----------------------------------------------------------------------
a anything element to be tested for membership in set or list
b {list,set} a set or list
Returns:
boolean
Synopsis: The member function returns true iff element a is in the set/list b
Examples:
> member(5, [1,2,5,7]);
true
See Also:
?intersect ?SearchArray ?subset ?union
?minus ?SearchOrderedArray ?table
min
Function min - minimum of numbers or list of numbers
Options: builtin and numeric
Calling Sequence: min(L1,L2,...)
Parameters:
Name Type Description
------------------------------------------------------------------------------------------
Li {numeric,list(numeric),list(list(numeric))} numbers or list (of lists) of numbers
Returns:
numeric
Synopsis: Finds the minimum valued element in L if L is simply a list of
numeric elements. If L is a list of lists of numeric, the function
effectively flattens this list to a simple list and returns the minimum
valued element.
Examples:
> min(5, 97, 22, [14,15,16] );
5
> min(2,3,5,7,11,13,17,19);
2
See also: ?avg ?max ?median ?std ?var
minus
Function minus
Options: builtin and polymorphic
Calling Sequence: a minus b
minus(a,b)
Parameters:
Name Type
------------
a a set
b a set
Returns:
set
Synopsis: Computes the set difference of two sets; that is a set consisting
of all elements in a but not in b. The value intersect() is understood to
be the entire universe, and hence subtracting intersect() will return the
empty set and subtracting from intersect() is not allowed.
Examples:
> {1,2,3} minus {2,3,4};
{1}
> {1,2,3} minus {};
{1,2,3}
> {1,2,3} minus intersect();
{}
See also: ?intersect ?member ?subset ?union
mod
Function mod
Options: builtin, numeric and polymorphic
Calling Sequence: mod(x,y)
Parameters:
Name Type Description
-----------------------------
x numeric a number
y numeric a number > 0
Returns:
numeric
Synopsis: This function computes the function x (mod y) i.e. it returns the
integer remainder after dividing y into x. Note: if x or y are so large
that they cannot be represented exactly as integers in a double precision
number, the results may be wrong.
Examples:
> mod(5,2);
1
> mod(99,1);
0
> mod(-3,2);
1
See also: ?ceil ?floor ?round ?trunc
mselect
Function mselect
Calling Sequence: mselect(fn,obj,[arg2,...])
Parameters:
Name Type
----------------------------------------------------------------------------
fn the selection function, returns true/false
obj a composed object (list, set, structure) whose parts
will be selected
arg2 additional arguments that are passed to fn
Returns:
anything : the result is of the same type as obj
Synopsis: Mselect selects the parts of the second argument and builds a new
object of the same type, but only with the parts for which the function fn
is true. More precisely, for each i from 1 to length(obj), op(i,obj) will
be in the result depending on fn(op(i,obj),arg2..) being true or false. The
extra arguments, arg2, ... are passed as additional arguments to fn.
Select is normally used on lists sets or structures.
Examples:
> mselect( type, [-1,0,1,1.2], posint );
[1]
> mselect( x -> (x<1), [-1,0,1,1.2] );
[-1, 0]
See also: ?op
names
Function names - find all assigned names
Option: builtin
Calling Sequence: names(typ)
Parameters:
Name Type
------------------------
typ {'assigned',type}
Returns:
an expression sequence
Synopsis: When no arguments are specified (or with "anything" as a typ), the
names function returns all names, assigned or unassigned. When the argument
typ is included, all names which are assigned a value of type typ are
returned. The special typ value "assigned" will return only the names which
are assigned. Be careful when using the all the names, that some names like
break, next etc. may produce very unexpected results when evaluated.
Examples:
> names(numeric);
LongInteger_log2base, DBmarkG, SumSq, NBody_Cost, ScaleIndex_I, MLPamDistance, AveNormSD_lim, DBL_EPSILON, DBL_MAX, MinIterBeforeNewton, ExpectedPamDistance, BINARY_IN_PATH, BINARY_IN_WRAPPER_FOLDER_32, DimensionlessFit, ntRNA, LongInteger_base, BINARY_HARDCODED, NumberErrors, VertexCoverLowerBound, SetRandSeed_value, MST_Qual, Pi, StepsForCG, LongInteger_base2, FollowLine_nmin, printlevel, BINARY_IN_WRAPPER_FOLDER, AveNormSD_Damp, NewNodeName_next, NoSpectralBeforeSD, LinearClassify_X0_i0, NBodyPotential, iii, Minimize_n, RepeatNewtonFactor, MinLen
See also: ?assigned ?types
noeval
Function noeval
Option: builtin
Calling Sequence: noeval(exp)
Parameters:
Name Type
-----------------
exp expression
Returns:
expression
Synopsis: The noeval function delays evaluation of the expression exp. It
simply returns exp.
Examples:
> unevaluated := noeval(1+1);
unevaluated := 1+1
> unevaluated_function := noeval(factorial(5));
unevaluated_function := factorial(5)
> unevaluated;
1+1
> unevaluated_function;
factorial(5)
See also: ?eval
op
Function op - pick up operands of an expression
Option: builtin
Calling Sequence: op(obj)
op(i,obj)
op(i..j,obj)
Parameters:
Name Type Description
--------------------------------------------------------------------
obj {array,equal,list,range,set,structure} an object with parts
i posint
j posint
Returns:
An expression sequence with the components of the object
Synopsis: The op(obj) function strips off the outer-most square brackets [,]
(list, array, matrix) or outer-most braces {,} (set). It returns an
expression sequence with the components. One use of op is to change, for
example, a list into a set. E.g. {op(x)}. When op is given two arguments,
a posint i and an object obj, the function returns the i^th part of obj. If
a range is given, it returns all the i^th through j^th parts of obj.
Examples:
> op([1, [a,b], 4]);
1, [a, b], 4
> op({1..2, {4, 5, {7}}});
1..2, {4,5,{7}}
> z := var = integer;
z := var = integer
> op(1,z);
var
> op(2, z);
integer
See also: ?selectorfunction (select operator a[i])
parse
Function parse
Option: builtin
Calling Sequence: parse(s)
Parameters:
Name Type
-------------------------------------------------------------
s a string with a correct Darwin expression or statement
Returns:
anything : an unevaluated Darwin expression/statement
Synopsis: Parse does the same syntactic analysis that Darwin would do on a
program or interactive command. It returns the object thus created without
evaluation. If the string s has a syntax error, then the command will print
an appropriate error and return an error condition. If more than one
statement is provided in the string, then these are concatenated and a
statement sequence is returned. A terminating semicolon is not necessary,
the parser will add one. Any NULL statement will be ignored.
Examples:
> parse('a+b');
a+b
> eval(parse('xyz := 1'));
1
> xyz;
1
See also: ?eval ?noeval
print
Function print - general pretty-printing
Option: polymorphic
Calling Sequence: print(e1,e2,...)
Parameters:
Name Type Description
-----------------------------
ei anything expression
Returns:
NULL
Synopsis: This function attempts to print out the contents of each e_{i} in a
pretty/readable manner. Any user-defined data structures/classes named, for
example ClassName, can make use of the print command by creating a procedure
named ClassName_print. This routine should detail the manner in which the
data structure is to be sent to the standard output. Any invocation of the
print statement on an object of type ClassName will automatically invoke
this routine. All built-in Darwin data structures have such a routine.
Floating point numbers are printed with 5 significant digits. The global
variable NumberFormat can be assigned a format, as in the printf function,
and all numbers will be printed accordingly.
To print procedures there are two options. Print on a procedure produces a
short description based on the parameters, description field (if any) and
return type. To print the body or the procedure (code) the function
disassemble should be used in conjunction with print. This produces a nice
albeit not perfect formatting.
Examples:
> x:= [[1,2],[3,4]];
x := [[1, 2], [3, 4]]
> print(x);
1 2
3 4
> f := proc(x:positive) description 'test example';
for i to 20 do x+i od; i+sin(x) end:
> print(f);
f: Usage: f( x:positive )
test example
> print(disassemble(op(PartialFraction)));
proc( r:numeric, eps:numeric )
local t, t2;
if nargs = 1 then procname(r, 1e-05)
elif r < 0 then t := procname(-1*r,eps); [-1*t[1],t[2]]
elif 1 < r*eps then [round(r),1]
elif r < eps then [0,1]
elif type(r,integer) then [r,1]
else
t2 := floor(r);
if r-1*t2 < eps then [t2,1]
else t := procname((r-1*t2)^(-1),r^2*eps); [t2*t[1]+t[2],t[1]] fi
fi
end:
See also: ?dprint ?lprint ?printf ?prints
printf
Function printf
Option: builtin
Calling Sequence: printf(textpattern, e1, e2,...)
Parameters:
Name Type
------------------------
textpattern string
ei expression
Returns:
NULL
Synopsis: The printf statement behaves in a similar manner as C's printf
statement.
Conversion characters for the printf co
Character Description
a prints any Darwin value including lists, sets and structures
A same as a, but will quote strings (same as dprint)
c prints a single character
d prints an integer
e prints a number in exponential notation
f prints a number (decimal notation)
g prints a number (general format, use f or e, whichever is shorter)
e prints a number (explicit exponent)
o prints the octal conversion of an integer
s prints a string (symbol or string)
u prints an unsigned integer
x prints the hexadecimal conversion of an integer
% prints a percent sign %
The cursor control sequences for the printf command:
Character Description
b backspace
n carriage return and newline
t tab
v newline
\\ single backslash
'' single quote
Examples:
> printf('%a, %a\n', ['L', 'I', 'S', 'T'],
'a means any structure');
[L, I, S, T], a means any structure
> int := 1234;
int := 1234
> printf('|%d|%10d|%-10d|\n', int, int, int, int);
|1234| 1234|1234 |
> t := 1234.567;
t := 1234.5670
> printf('|%f|%12f|%12.5f|%-12.5f|\n', t, t, t, t);
|1234.567000| 1234.567000| 1234.56700|1234.56700 |
> printf('|%11s|%12s|%12s|%12s|\n', 'normal',
'field of 12', '5 decimal', 'left flush');
| normal| field of 12| 5 decimal| left flush|
See also: ?lprint ?print ?prints ?sprintf ?sscanf
prints
Function prints - print strings in full length
Calling Sequence: prints(string1,...)
Parameters:
Name Type Description
-----------------------------------------
string1 string a string to be printed
Returns:
NULL
Synopsis: Print all the arguments as strings (format %s) ended with a
newline.
See also: ?dprint ?lprint ?print ?printf
product
Function product
Option: builtin
Calling Sequence: product(a)
product(p,i = lo..hi)
product(p,i = s)
Parameters:
Name Type Description
--------------------------------------------------------------------
a list a list of multipliable elements
lo numeric lower bound of index
hi numeric upper bound of index
p anything expression to be multiplied for all index values
s {list,set} set or list of index values
Returns:
numeric
Synopsis: When product is called with a list, the product of all the elements
of the list is computed. The formats with an index variable, i, multiplies
the expression p for all the values of the variable i. The expression p is
evaluated each time that i is assigned a value. If a range of values is
given, i is first assigned lo which is incremented by 1 every time. The
expression p is evaluated and multiplied as long as i <= hi. In the third
format, i is assigned all the values of the set or list. The index variable
can be assigned another value, it will not be changed, nor it will disturb
the multiplication.
Examples:
> product([1,3,6]);
18
> product(i,i=1..10);
3628800
> product(i^2,i={2,3,5,7});
44100
> i:=nonsense;
i := nonsense
> product(10*i,i=1..2);
200
> i;
nonsense
> product([]);
1
See also: ?list ?op ?seq ?sum ?zip
regexp
Function regexp( r:string, s:string )
Returns the positions and lengths of regexp r in string s.
remember
Function remember - evaluate a function and remember its result
Option: builtin
Calling Sequence: remember(func_call)
Parameters:
Name Type Description
---------------------------------------
func_call structure a function call
Returns:
anything
Synopsis: The remember function stores results of function evaluations in an
internal table for the purpose of saving computation time. When remember is
called, the system checks to see if the argument function has been called
previously with the same arguments, and if so, then the previous result is
returned. If it is not found, the function call is executed and its result
stored in the internal table as well as returned to the user. The internal
table does not keep all the results forever, at garbage collection time
arguments that are no longer available will cause the corresponding entries
to be removed. Eventually, all unused entries will be removed. The user of
remember should keep in mind that this is a heuristic saving of evaluations,
it should not be counted on happening every time.
Remember is usable when the argument function does not have side effects (for
example printing), as it will be unpredictable when these side effects will
happen. It should also be used on functions which do a significant amount
of computation, else its effort is not justifiable. The profiling tools are
good to determine which functions will profit from remembering.
Warning: When the returned value is a structure (e.g. a matrix or a class),
changing the structure will also change value stored in the remember-table!
This will lead to unexpected behaviour.
In case that the user wants to erase the remember table, (for example the
function to be remembered has changed its behavior in some way and old
values should not be remembered), calling remember with the argument "erase"
will erase all previously remembered values
For the example below we compute the Fibonacci numbers with their simple
recurrence. Without the remember function, this definition takes
exponential time.
Examples:
> F := proc( n:integer )
if n < 2 then n else
remember(F(n-1)) + remember(F(n-2)) fi end:
> [ seq( F(i), i=0..10 )];
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
> F(50);
12586269025
See also: ?hash ?profiling ?table
return
Function return
Option: builtin
Calling Sequence: return(obj)
Parameters:
Name Type
---------------
obj anything
Returns:
anything
Synopsis: The return function causes Darwin to immediately exit a procedure
and return obj to the point of calling.
Examples:
> sum_up := proc(x)
total := 0:
for i from 1 to x do
total := total + i
od;
return(total);
end;
sum_up := proc (x) local total, i; total := 0; for i to x do total := total+i od; return(total) end
> sum_up(1772);
1570878
See also: ?exit
round
Function round
Options: builtin, numeric and zippable
Calling Sequence: round(x)
Parameters:
Name Type
--------------
x numeric
Returns:
integer
Synopsis: This function rounds the argument x to the nearest integer. For
exact integers + 1/2, the rounding is done according to the next higher
significant bit (IEEE standard).
Examples:
> round(5.5555555);
6
> round(1.3);
1
> round(-7.8);
-8
See also: ?ceil ?floor ?mod ?trunc
scalb
Function scalb
Options: builtin and numeric
Calling Sequence: scalb(x,n)
Parameters:
Name Type
------------------------
x a numerical value
n an integer
Returns:
numeric
Synopsis: scalb returns the value x multiplied by the base to the power n.
This function is defined in the IEEE 754 floating point standard. For IEEE
754 floating point, the base is 2 and scalb(x,n) = x * 2^n, is computed by
exponent manipulation directly from the representation. Hence it is very
fast and exact.
Examples:
> scalb(1,10);
1024
> scalb(1,-1023);
1.1125e-308
> scalb(0,1024);
0
See also: ?ilogb ?lg
seq
Function seq
Option: builtin
Calling Sequence: seq(e,n)
seq(e,i = lo..hi)
seq(e,i = SetOrList)
Parameters:
Name Type Description
-----------------------------------------------------------
e an arbitrary expression
n integer
i symbol
lo numeric
hi numeric
SetOrList {list,set} set or list of values
Returns:
expression sequence of the e objects
Synopsis: In the first format, an expression sequence with e replicated n
times is returned. This is useful, for example, to create arrays with
initial values and to pad arrays. Normally, expression sequences will be
enclosed in lists, sets or as arguments of functions or data structures. In
the second format, an expression sequence is produced for all the values of
e with the symbol i assigned consecutive values from lo to hi (inclusive).
In both cases, a negative integer or hi < lo will generate an empty
expression sequence. In the third format, the variable i will take all the
values from the set or list.
Examples:
> [seq(7,3)];
[7, 7, 7]
> {seq(2^i,i=0..10)};
{1,2,4,8,16,32,64,128,256,512,1024}
> A(seq(i,i=1.5..2.8));
A(1.5000,2.5000)
> seq(Rand(),4);
0.8632, 0.4194, 0.7952, 0.2781
> [seq(0,5),seq(i,i=1..5),seq(6,3)];
[0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 6, 6]
See also: ?op ?sum ?zip
sequal
Function sequal
Option: builtin
Calling Sequence: sequal(a,b)
Parameters:
Name Type
------------------------------
a an arbitrary expression
b an arbitrary expression
Returns:
true or false
Synopsis: sequal tests for the structural equality of expressions. This
means that if two expressions differ in their structure, but represent the
same value, (e.g. LongInteger(1) and 1), sequal will just test for the
structural equality, and hence sequal(LongInteger(1),1) will return false,
whereas evalb(LongInteger(1)=1) will return true. Quoted strings and
symbols representing the same character sequence, compare equal under normal
equality but will compare different with sequal. The primary use of sequal
is to take advantage of the representation of an object, and test for an
exact representation, other than just a value. Indiscriminate use of sequal
leads to non-polymorphic programs.
Examples:
> sequal(LongInteger(1),1);
false
> evalb(LongInteger(1)=1);
true
> sequal( {1,2,3}, {1,2} );
false
> sequal('abc',abc);
false
See also: ?evalb ?If ?objectorientation
sha2
Function sha2 - Computes SHA2 hash of a string
Option: builtin
Calling Sequence: sha2(s)
Parameters:
Name Type Description
-----------------------------------
s string string to be hashed
Returns:
string
Synopsis: This function computes the 512bit SHA2 hash value of a given
string. The result is represented as a hex-formatted string
Examples:
> sha2('abc');
ddaf35a193617abacc417349ae20413112e6fa4e89a97ea20a9eeee64b55d39a2192992a274fc1a836ba3c23a3feebbd454d4423643ce80e2a9ac94fa54ca49f
See also: ?hash
sign
Function sign - sign (-1,0,1) of a number or list of numbers
Calling Sequence: sign(val)
Parameters:
Name Type Description
---------------------------------------------------
val {list,numeric} any value or list of values
Returns:
{-1,0,1}
Synopsis: Returns -1 (if val<0), 0 (if val=0) or 1 (if obj>0). It also maps
itself onto lists (and hence matrices).
Examples:
> sign(-5);
-1
> sign( [-1,2,-3,4,0] );
[-1, 1, -1, 1, 0]
See also: ?If ?max ?min ?zip
sin
Function sin
Options: builtin, numeric, polymorphic and zippable
Calling Sequence: sin(x)
Parameters:
Name Type
------------------------
x a numerical value
Returns:
numeric
Synopsis: This function computes the trigonometric sine function. sin(x) has
simple zeros at at x=n*Pi.
References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun,
Ch 4.3
Examples:
> sin(0);
0
> sin(Pi/4);
0.7071
> sin(Pi/2);
1
> sin(-Pi);
-1.2246e-16
See also: ?arcsin ?arctan ?cos ?tan
size
Function size - number of words used by the entire object
Option: builtin
Calling Sequence: size(obj)
Parameters:
Name Type Description
-----------------------------
obj anything any object
Returns:
{0,posint}
Synopsis: Returns the total number of words used by the representation of the
obj in memory. This is the number of words, which depending on the hardware
will be 32 or 64 bits words. See version() for this information. This
should be used mostly for comparative purposes, when two alternatives for
representing some information have to be evaluated. Size will not count the
name of the Class for data structure objects, under the assumption that this
name is defined only once and used too many times. Other objects, even if
used repeatedly will be counted entirely.
Examples:
> size('');
3
> size({1,2,{a,b,c}});
21
> size([1,2,3,4]);
7
> size([1.2,2.2,3.2,4.2]);
15
See also: ?assemble ?Class ?CreateArray ?disassemble ?length ?version
sleep
Function sleep
Option: builtin
Calling Sequence: sleep(t)
Parameters:
Name Type
-------------
t posint
Returns:
NULL
Synopsis: This function causes Darwin to sleep (delay execution) for t
seconds. Only the keystroke will interrupt the sleep command.
Examples:
> sleep(1);
See also: ?CallSystem ?getpid ?TimedCallSystem
sort
Function sort - sort a list
Option: builtin
Calling Sequence: sort(L)
sort(L,orderproc)
Parameters:
Name Type Description
----------------------------------------------------------
L list(anything) a list of things to be sorted
orderproc procedure an ordering procedure
Returns:
list(anything)
Synopsis: The sort function can order a list (array) containing any type of
elements as long as these elements are comparable i.e. the operator <= is
applicable and well-defined. When only supplied a list, sort places the
elements in ascending order and returns a copy of the list. The ordering it
uses is ascending order and for other data structures it is the same order
that sets use. In particular, if there are no duplicate elements in the
input list, sorting without an orderproc or transforming the list into a set
have the same effect. The optional second argument must specify an ordering
procedure. This procedure may have a single argument, in which case it is
understood to return a value on which to order the records, or may take two
arguments, in which case it should return true or false depending on whether
the arguments are in the desired order. In both cases the arguments will be
the entries of the array to be sorted. Sort does not destroy/change its
argument, it returns a new array of (sorted) data. Naturally, sort is most
efficient when called with a single argument.
Examples:
> a := [521, -923, 1293, 521, -3342];
a := [521, -923, 1293, 521, -3342]
> sort(a);
[-3342, -923, 521, 521, 1293]
> a;
[521, -923, 1293, 521, -3342]
> sort(a, x -> -x);
[1293, 521, 521, -923, -3342]
> neg := proc(a) return(-(abs(a))) end;
neg := proc (a) return(-1*|a|) end
> sort(a, neg);
[-3342, 1293, -923, 521, 521]
> b :=[[z, f], [w, e], [y, d]];
b := [[z, f], [w, e], [y, d]]
> sort(b, b->b[2]);
[[y, d], [w, e], [z, f]]
See also: ?set
sprintf
Function sprintf - Storage print -return a string as if printed
Option: builtin
Calling Sequence: sprintf(p,a1..ak)
Parameters:
Name Type
-----------------------------------------------
p pattern (same format as for printf)
ai arguments to be formatted according to p
Returns:
string
Synopsis: This function behaves similar to C's sprintf function.
Examples:
> i:=5;
i := 5
> j:=6;
j := 6
> sprintf('i and j are: %d %d', i, j);
i and j are: 5 6
See also: ?printf (for a complete list of all conversion codes) ?sscanf
sqrt
Function sqrt - Square Root
Options: builtin, numeric and zippable
Calling Sequence: sqrt(x)
Parameters:
Name Type
-------------------
x numeric >= 0
Returns:
numeric
Synopsis: Computes the square root of x.
Examples:
> sqrt(5);
2.2361
See also: ?Complex ?Polar
sscanf
Function sscanf - String Format Scan
Option: builtin
Calling Sequence: sscanf(txt,pat)
Parameters:
Name Type Description
---------------------------
txt string a string
pat string a pattern
Returns:
list
Synopsis: This function behaves similar to C's scanf function.
Examples:
> sscanf('hello 6 3', '%s %d %d');
[hello, 6, 3]
See also: ?printf (for a complete list of all conversion codes) ?sprintf
std
Function std - unbiased estimate of standard deviation of (list of) numbers
Calling Sequence: std(L1,L2,...)
Parameters:
Name Type Description
------------------------------------------------------------
Li {numeric,list(numeric)} a number or list of numbers
Returns:
numeric
Synopsis: Finds the variance of all the values in the arguments. This is an
unbiased estimator of the variance, that is it is computed with the formula:
(sum(x^2) - sum(x)^2/n) / (n-1), where n is the number of x values. This
function needs at least two values to compute successfully
Examples:
> std(5, 97, 22, [14,15,16] );
34.1609
> std(2,3,5,7,11,13,17,19);
6.3906
See also: ?avg ?max ?median ?min ?var
string
Function string( a )
Converts argument to a string. Multiple arguments are concatenated
string_RGB
Function string_RGB - convert a color name into an RGB vector
Calling Sequence: string_RGB(s)
Parameters:
Name Type Description
--------------------------------------------
s string an color name without spaces
Returns:
nonnegative : nonnegative
Synopsis: This function converts a color name into a 3 value RGB vector. The
vector contains the values for red, green and blue in a scale of 0 to 1.
Black is [0,0,0] and white is [1,1,1]. The name matching is case
independent and it tolerates up to two errors. About 650 colours are known
to this function. The full list can be found at lib/Color.
Examples:
> string_RGB(MidnightBlue);
[0.09803922, 0.09803922, 0.4392]
> string_RGB(midnightBLAU);
[0.09803922, 0.09803922, 0.4392]
> string_RGB(chocolate);
[0.8235, 0.4118, 0.1176]
See also: ?Color ?DrawTree ?RGB_string
subs
Function subs - substitute occurrences of subexpressions
Calling Sequence: subs(val1 = repl1,val2 = repl2,...,s)
Parameters:
Name Type Description
-------------------------------------------------
val.i anything an object to be replaced in s
repl.i anything the replacement of val.i
s anything an arbitrary object
Returns:
anything
Synopsis: The function subs, creates a new expression, substituting every
occurrence of the given values by the corresponding replacements. The
substitutions happen left-to-right for the entire s.
Examples:
> subs(3=abc,[1,2,3]);
[1, 2, abc]
> subs(2=77,[77]=abc,A(1,[2],3));
A(1,abc,3)
> subs(A=B,A(11,22));
B(11,22)
See also: ?coeff ?has ?hastype ?indets ?lcoeff ?mselect ?noeval ?types
subset
Function subset
Option: builtin
Calling Sequence: subset(a,b)
Parameters:
Name Type
-----------
a set
b set
Returns:
boolean
Synopsis: The subset function returns true if and only if every element in
set a is in set b.
Examples:
> subset({1,2,3}, {1,2,3,4});
true
See also: ?intersect ?member ?minus ?union
sum
Function sum
Option: builtin
Calling Sequence: sum(a)
sum(p,i = lo..hi)
sum(p,i = s)
Parameters:
Name Type Description
----------------------------------------------------------------
a list a list of summable elements
lo numeric lower index of summation
hi numeric upper bound of summation
p anything expression to be summed for all index values
s {list,set} set or list of index valuex
Returns:
numeric
Synopsis: When sum is called with a list, the sum of all the elements of the
list is computed. The formats with an index variable, i, sum the expression
p for all the values of the variable i. The expression p is evaluated each
time that i is assigned a value. If a range of values is given, i is first
assigned lo which is incremented by 1 every time. The expression p is
evaluated and summed as long as i <= hi. In the third format, i is assigned
all the values of the set or list. If sum is applied on a matrix, the rows
are summed, an easy way of adding the columns. If it is applied twice on a
matrix it will return the sum of all the elements of a matrix. The
summation variable can be assigned another value, it will not be changed,
nor it will disturb the summation.
Examples:
> sum([1,3,19]);
23
> sum(1/i,i=1..1000);
7.4855
> sum(i^2,i={2,3,5,7});
87
> A := [[1,2,3],[2,2,2]]:
> sum(A);
[3, 4, 5]
> sum(sum(A));
12
> i:=nonsense;
i := nonsense
> sum(10*i,i=1.53 .. 2);
15.3000
> i;
nonsense
See also: ?list ?matrix ?op ?product ?seq ?zip
symbol
Function symbol
Option: builtin
Calling Sequence: symbol(s)
Parameters:
Name Type
---------------
s a string
Returns:
symbol
Synopsis: Symbol transforms a string into a symbol (a Darwin variable that
can hold values). This is typically needed when a name is formed by
concatenation or as a result of an sprintf() command. The symbol obtained
always refers to a global symbol, never to a local or to a parameter, when
computed inside a procedure.
Examples:
> symbol(a.b);
ab
> type(symbol(a.b));
symbol
> type(a.b);
string
See also: ?names ?string
table
Class table - structure to store and retrieve elements by name
Template: table()
table(unassig)
Fields:
Name Type Description
----------------------------------------------------------------------------
unassig anything value to be returned for an unassigned entry
procedure a procedure that will be invoked on unassigned entries
key anything key value for accessing or storing in table
Returns:
table
Methods: list plus print table_type
Synopsis: A table stores arbitrary values or structures, which can be
accessed by a key. The key can be any valid object in Darwin. The access
to the table is done with normal indexing and the assignment of values is
done with assignments. When an inexistent element is accessed a special
value is returned. By default this value is the symbol "unassigned". It
can be changed to any other value. If the default value is a procedure, it
will be understood that the value to be returned on an inexistent entry is
the result of computing the procedure over the argument. For sparse
numerical tables, it is convenient to set the unassigned value to 0 so
addition into the table can be done directly.
To test if an entry is assigned or not, it is not possible to use the function
assigned, as the table is not a name, and non-existent entries are
automatically considered to have the default value. Instead, testing for
the default value should be used.
The iterator Indices() will operate on a table and iterate over all the
existing (assigned) indices of the table.
Examples:
> Kingdom := table(unknown):
> Kingdom[mouse] := Eukaryota: Kingdom[ecoli] := Bacterium:
> [Kingdom[mouse], Kingdom[rat]];
[Eukaryota, unknown]
> print(Kingdom);
ecoli --> Bacterium
mouse --> Eukaryota
> Kingdom[ecoli] := Bacteria;
Kingdom[ecoli] := Bacteria
> for z in Indices(Kingdom) do lprint(z,Kingdom[z]) od;
ecoli Bacteria
mouse Eukaryota
See Also:
?assigned ?SearchAllArray ?SearchOrderedArray ?subset
?member ?SearchArray ?set ?Table
tan
Function tan
Options: builtin, numeric, polymorphic and zippable
Calling Sequence: tan(x)
Parameters:
Name Type
------------------------
x a numerical value
Returns:
numeric
Synopsis: This function computes the trigonometric tangent function defined
by: tan(x) = sin(x)/cos(x). tan(x) has a simple poles at x=Pi/2+n*Pi.
References: Handbook of Mathematical Functions, M. Abramowitz and I. Stegun,
Ch 4.3
Examples:
> tan(0);
0
> tan(Pi/4);
1.0000
> tan(-Pi);
1.2246e-16
See also: ?arcsin ?arctan ?cos ?sin
time
Function time
Option: builtin
Calling Sequence: time()
time(expr)
Returns:
expression
Synopsis: This function returns the time needed to evaluate expr. If no
expression is specified, it returns the total CPU time used by the current
session of Darwin. If time is called with the string "all", then the total
CPU time of the process and all its children is returned. This is useful to
find the total time used when Darwin calls other programs.
Examples:
> time();
37.5100
> time(exp(1.7 * 3.14));
0
> time(all);
40.9000
See also: ?date ?gigahertz ?TimedCallSystem ?UTCTime
transpose
Function transpose - transpose a matrix
Option: builtin
Calling Sequence: transpose(A)
A^T
A^t
Parameters:
Name Type
-----------------------
A matrix(anything)
Returns:
matrix(anything)
Synopsis: Computes the transpose, A^T, of a matrix A. (The transpose of a
matrix is produced by replacing entry A_ij with entry A_ji for all i, j.)
Transposition can also be achieved through the use of the exponent T or t.
For this to work properly, T or t must not be assigned. Transpose will also
work for higher order arrays.
Examples:
> A := transpose([[1, 2, 3], [4, 5, 6], [7, 8, 9]]);
A := [[1, 4, 7], [2, 5, 8], [3, 6, 9]]
> A^t;
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
See Also:
?Cholesky ?GaussElim ?LinearProgramming
?convolve ?GivensElim ?matrix
?Eigenvalues ?Identity ?matrix_inverse
traperror
Function traperror
Option: builtin
Calling Sequence: traperror(exp)
Parameters:
Name Type
-----------------
exp expression
Returns:
The result of evaluating exp or a string.
Synopsis: If an error occurs while evaluating exp, the traperror function
returns a string consisting of the Darwin error message. Execution does
not halt. If no error occurs, traperror simply returns the result of
evaluating exp.
Examples:
> traperror( undefined_symbol/20 );
undefined_symbol, variable not assigned, invalid term in product
See also: ?error ?lasterror
trim
Function trim - Removes leading and trailing whitespace from a string
Calling Sequence: trim(s)
trim(s,chars)
Parameters:
Name Type Description
------------------------------------------------------
s string string to be trimed
chars set (optional) set of chars to be removed
Returns:
string
Synopsis: Return a copy of the string s with leading and trailing whitespace
removed. If chars is not specified, the following characters are considered
to be whitespaces: ' ','\t','\n','\r' and '\0'.
Examples:
> trim(' Hello ');
Hello
> trim('a World ',{' ','a'});
World
See also: ?ConcatStrings ?RenderTemplate ?string
trunc
Function trunc
Options: builtin, numeric and zippable
Calling Sequence: trunc(x)
Parameters:
Name Type
--------------
x numeric
Returns:
integer
Synopsis: Returns the integer portion of the argument x.
Examples:
> trunc(9.9999999);
9
> trunc(-9.99999);
-9
See also: ?ceil ?floor ?mod ?round
type
Function type - type testing
Option: builtin
Calling Sequence: type(exp)
type(exp,typeeval)
Parameters:
Name Type Description
-----------------------------------
exp anything an expression
typeeval any type
Returns:
{boolean,type}
Synopsis: The type function with two arguments returns true if the type of
evaluated exp is typeeval. Otherwise, it returns false. With a single
argument, it returns the type of expression exp.
Examples:
> type(a, anything);
true
> type(5, integer);
true
> type('hello', string);
true
> type('abc');
string
See Also:
?types (This gives a full description of the valid types and
their compositions)
union
Function union
Options: builtin and polymorphic
Calling Sequence: a union b
union(a,b)
Parameters:
Name Type
-----------
a set
b set
Returns:
set
Synopsis: Computes the union of two sets, that is a set which has all the
elements of a and b. Repeated elements are removed from the resulting set.
The value intersect() is understood to be the entire universe, and hence
unions including intersect() will simply return this term. In its
functional form, any arbitrary number of sets can be unioned. In
particular, union(a) = a, and union() = {}.
Examples:
> {1,2,3} union {2,3,4};
{1,2,3,4}
> {1,2,3} union {};
{1,2,3}
> {1,2,3} union intersect();
intersect()
See also: ?intersect ?member ?minus ?subset
uppercase
Function uppercase
Option: builtin
Calling Sequence: uppercase(t)
Parameters:
Name Type
-------------
t string
Returns:
string
Synopsis: This function returns the string t converted to uppercase.
Examples:
> uppercase('I have been converted');
I HAVE BEEN CONVERTED
See also: ?lowercase
var
Function var - unbiased estimate of variance of (list of) numbers
Calling Sequence: var(L1,L2,...)
Parameters:
Name Type Description
------------------------------------------------------------
Li {numeric,list(numeric)} a number or list of numbers
Returns:
numeric
Synopsis: Finds the variance of all the values in the arguments. This is an
unbiased estimator of the variance, that is it is computed with the formula:
(sum(x^2) - sum(x)^2/n) / (n-1), where n is the number of x values. This
function needs at least two values to compute successfully
Examples:
> var(5, 97, 22, [14,15,16] );
1166.9667
> var(2,3,5,7,11,13,17,19);
40.8393
See also: ?avg ?max ?median ?min ?std
version
Function version
Option: builtin
Calling Sequence: version()
Returns:
expseq
Synopsis: Returns and expression sequence with 8 components:
1 VersionType: string, Production or Debug
2 Architecture: string, encoded name of architecture
3 Version: number
4 CompiledWith: string, name of the compiler used
5 CompilerVersion: string
6 CompilerOptions: string
7 DateCompiled: string, result of system command date
8 CharactersPerWord: posint, number of characters per word
Examples:
> version();
RelWithDebInfo, Linux, 4, /usr/bin/gcc, 4.4.3, -static, Tue Feb 19 10:53:16 CET 2013, 8, ON
warning
Function warning - outputs warning string on STDERR
Option: builtin
Calling Sequence: warning(txt)
Parameters:
Name Type Description
-------------------------------------------------
txt string the warning message to be printed
Returns:
NULL
Synopsis: This function outputs a warning message on the error stream.
Examples:
> warning('This is a warning');
WARNING: This is a warning
See also: ?error ?lasterror ?traperror
zip
Function zip - compute an expression for each component
Option: builtin
Calling Sequence: zip(expr)
Parameters:
Name Type
-----------------
expr expression
Returns:
list(expression)
Synopsis: Compute an expression over the components of a list element-wise.
Zip gets its name from an operation like a+b, where a and b are lists of the
same length and the result is the component-wise sum of each element of a
and b. It is like zipping the two vectors together. In general, if the
expr is an expression which contains vectors (or list or sets), and all
these lists or sets are of the same length, then zip will compute the
expression for each value of the lists/sets and return a list with the
results. The arguments or components of expr which are not list/sets will
be taken as constants. Notice that even if the argument contain only sets,
zip will still return a list.
Examples:
> zip( sin( [1,2,3] ));
[0.8415, 0.9093, 0.1411]
> f := proc(a,b,c) a*b+c end;
f := proc (a, b, c) a*b+c end
> zip( f( [1,2,3], 10, {0.1,0.2,0.3} ));
[10.1000, 20.2000, 30.3000]
> zip( f( 1, [2,3,4,5], Pi-3 ));
[2.1416, 3.1416, 4.1416, 5.1416]
See also: ?op ?seq ?sum
DBL_EPSILON
System variable DBL_EPSILON
Synopsis: The system variable DBL_EPSILON has the property that it is the
smallest number where 1+DBL_EPSILON <> 1.
See also: ?DBL_MAX
DBL_MAX
System variable DBL_MAX
Synopsis: The system variable holds the value of the maximum double numeric
allowed in Darwin. This variable is set in the library file darwinit. The
LongInteger() routines in Darwin allow for larger integers.
Examples:
> DBL_MAX;
1.7976931348623147e+308
See also: ?LongInteger
DMS
System variable DMS
Synopsis: The DMS (Dayhoff matrices) system variable has type list(DayMatrix)
and contains 1266 Dayhoff matrices for various PAM distances between 0.049
and 1000 after a call to the function CreateDayMatrices(). Some routines
perform all operations under the assumption that the Dayhoff matrices
currently contained in DMS are the correct Dayhoff matrices to use.
See also: ?CreateDayMatrices ?CreateDayMatrix ?DayMatrix ?DM
DM
System variable DM
Synopsis: The DM (Dayhoff matrix) system variable has type DayMatrix and
contains a Dayhoff matrix computed at PAM distance 250 after a call to the
function CreateDayMatrices(). Some routines perform all operations under
the assumption that the Dayhoff matrix currently contained in DM is the
correct Dayhoff matrix to use.
See also: ?CreateDayMatrices ?CreateDayMatrix ?DayMatrix ?DMS
DigestionWeights
Class DigestionWeights - data structure to hold digestion information
Template: DigestionWeights(digestor,weights)
Fields:
Name Type Description
-----------------------------------------------------------------
digestor string name of the digestion enzyme
weights numeric molecular weights of the fragments
{equation,symbol} amino acid weight modification
Returns:
DigestionWeights
Methods: DigestionWeights_type
Synopsis: DigestionWeights is a data structure used to hold the name of the
digestion enzyme followed by the weights obtained from the digestion. See
?enzymes for a complete description of the enzymes being recognized and
their properties. Additionally we can specify various conditions that
result in weight modifications of the amino acids. The weight modifications
can be placed anywhere in the list of weights and are all optional.
Currently these are:
C=208.29 An equation with a one-letter code on the lhs and a weight on the
right indicates to the program that the given amino acid (due to some
modification pre/post digestion) has the given weight.
Deuterated This word will indicate that all the hydrogen atoms have been
exchanged with Deuterium, and hence the weights of all aa should be adjusted
accordingly.
If the digestor is CNBr or TrypsinCysModified or NTCB, changes to the weights
are made automatically.
Examples:
> DigestionWeights('Trypsin',
601.9438, 504.0904, 1512.4545, 480, 590);
DigestionWeights(Trypsin,601.9438,504.0904,1512.4545,480,590)
See Also:
?DigestAspN ?DigestWeights ?enzymes ?ProbCloseMatches
?DigestSeq ?DynProgMass ?MassProfileResults ?Protein
?DigestTrypsin ?DynProgMassDb ?ProbBallsBoxes ?SearchMassDb
EOF
System variable EOF
Synopsis: The system variable EOF is used to mark the end of a file.
See also:
Edges
Class Edges
Template: Edges(L)
Fields:
Name Type
---------------------------
L { list(Edge), NULL }
Returns:
Edges
Methods: Edges_type set
Synopsis: The Edges structure is the first field of a Graph data structure.
It consists of a list of Edge structures.
Examples:
> G := Graph( Edges( Edge(4,1,2), Edge(7,1,3), Edge(6,2,4),
Edge(5,3,4) ), Nodes(1, 2, 3, 4) );
G := Graph(Edges(Edge(4,1,2),Edge(7,1,3),Edge(6,2,4),Edge(5,3,4)),Nodes(1,2,3,4))
See Also:
?BipartiteGraph ?Graph_Rand ?ParseDimacsGraph
?Clique ?Graph_XGMML ?Path
?DrawGraph ?InduceGraph ?RegularGraph
?Edge ?MaxCut ?ShortestPath
?EdgeComplement ?MaxEdgeWeightClique ?TetrahedronGraph
?FindConnectedComponents ?MinCut ?VertexCover
?Graph ?MST
?Graph_minus ?Nodes
Entries
Iterator Entries - iterates over all entries in a database
Usage: for z in Entries() do ... od;
Returns:
Entry
Synopsis: This is an iterator which returns all the entries from the default
database (stored in DB). The entries are returned in order, ie. Entry(1),
Entry(2), etc.
See Also:
?AC ?GetEntryInfo ?ID ?PatEntry ?Sequences
?Entry ?GetEntryNumber ?iterator ?Sequence
Infix
Iterator Infix - walks over all the nodes of a tree in infix order
Usage: for n in Infix(tree) do ... od;
Parameters:
Name Type Description
----------------------------
tree Tree a general tree
Returns:
Tree
Synopsis: This is an iterator which returns all the nodes (internal nodes of
type "Tree" or external nodes of type "Leaf") of a tree in infix order.
Infix order means that the left subtree is visited first, then the node,
then the right subtree, for every node recursively.
See Also:
?iterate ?Leaf ?objectorientation ?Prefix
?iterator ?Leaves ?Postfix
Leaves
Iterator Leaves - walks over all the leaves of a tree
Usage: for n in Leaves(tree) do ... od;
Parameters:
Name Type Description
----------------------------
tree Tree a general tree
Returns:
Leaf
Synopsis: This is an iterator which returns all the leaves of a tree in infix
order.
See Also:
?Infix ?iterator ?objectorientation ?Prefix
?iterate ?Leaf ?Postfix
Lines
Iterator Lines - iterates over all lines in a string
Usage: for z in Lines(s) do ... od;
Parameters:
Name Type Description
---------------------------
s string any string
Returns:
string
Synopsis: This is an iterator which returns all the lines of a string
(separated by a '\n' character) in the original order. The newline character
at the end of each line is also included in the return value.
See Also:
?iterate ?objectorientation ?SplitLines
?iterator ?SearchDelim ?string
List
Class List - holds contents of a List of displayable items
Template: List(labelling,item1,item2,...)
Returns:
List
Fields:
Name Type Description
-----------------------------------------------------------------
labelling {procedure,string} labelling method
item_i {string,structure} text or structure for each entry
Methods: HTMLC LaTeXC List_type print string
Synopsis: The List structure holds information which will be formatted as a
simple list. The first argument is a procedure which should produce a
string for each integer argument. This will be the label that is used for
each entry in the list. If the first argument is a string with a "%" in it,
it is interpreted as an argument for sprintf. This is an easy way to
provide arbitrary formating of numbers. If it is a string, that string is
used for all items in the list. A list is normally part of a Document or
some other structure intended for display or human-readable purposes. The
following table shows some common labelling functions and their results for
a few integers:
procedure 1 2 10 20 30
--------------------------------------------------------------
Roman I II X XX XXX
Alphabetical A B J T AD
x->lowercase(Roman(x)) i ii x xx xxx
x->sprintf('(%s)',Alphabetical(x)) (A) (B) (J) (T) (AD)
'(%d)' (1) (2) (10) (20) (30)
'o' o o o o o
Examples:
> string( List('--%d--',First,Second));
--1-- First
--2-- Second
See Also:
?Block ?Document ?latex ?RunDarwinSession
?Code ?HTML ?Paragraph ?screenwidth
?Color ?HyperLink ?PostscriptFigure ?Table
?Copyright ?Indent ?print ?TT
?DocEl ?LastUpdatedBy ?Roman ?View
NULL
System variable NULL
Synopsis: The NULL expression sequence.
Nodes
Class Nodes
Template: Nodes(N)
Fields:
Name Type
--------------------------------
N {list({posint, 0}), NULL}
Returns:
Nodes
Methods: Nodes_type
Synopsis: The Nodes structure holds the list of labels for nodes in a graph.
Examples:
> G := Graph( Edges( Edge(4,1,2), Edge(7,1,3), Edge(6,2,4),
Edge(5,3,4) ), Nodes(1, 2, 3, 4) );
G := Graph(Edges(Edge(4,1,2),Edge(7,1,3),Edge(6,2,4),Edge(5,3,4)),Nodes(1,2,3,4))
See Also:
?BipartiteGraph ?Graph_minus ?ParseDimacsGraph
?Clique ?Graph_Rand ?Path
?DrawGraph ?Graph_XGMML ?RegularGraph
?Edge ?InduceGraph ?ShortestPath
?EdgeComplement ?MaxCut ?TetrahedronGraph
?Edges ?MaxEdgeWeightClique ?VertexCover
?FindConnectedComponents ?MinCut
?Graph ?MST
NucDB
System variable NucDB
Synopsis: This system variable is used to point to a database containing
nucleotide or ribonucleotide sequences.
See also: ?DB ?PepDB
PepDB
System variable PepDB
Synopsis: This system variables is used to point to a database containing
amino acid sequences.
See also: ?DB ?NucDB
Pi
System variable Pi
Synopsis: Contains the value of Pi.
Postfix
Iterator Postfix - walks over all the nodes of a tree in postfix order
Usage: for n in Postfix(tree) do ... od;
Parameters:
Name Type Description
----------------------------
tree Tree a general tree
Returns:
Tree
Synopsis: This is an iterator which returns all the nodes (internal nodes of
type "Tree" or external nodes of type "Leaf") of a tree in postfix order.
Postfix order means that the left subtree is visited first, then the right
subtree, then the node, for every node recursively.
See Also:
?Infix ?iterator ?Leaves ?Prefix
?iterate ?Leaf ?objectorientation
Prefix
Iterator Prefix - walks over all the nodes of a tree in prefix order
Usage: for n in Prefix(tree) do ... od;
Parameters:
Name Type Description
----------------------------
tree Tree a general tree
Returns:
Tree
Synopsis: This is an iterator which returns all the nodes (internal nodes of
type "Tree" or external nodes of type "Leaf") of a tree in prefix order.
Prefix order means that the node is visited first, then the left subtree,
then the right subtree, for every node recursively.
See Also:
?Infix ?iterator ?Leaves ?Postfix
?iterate ?Leaf ?objectorientation
Primes
Iterator Primes - generates the prime numbers
Usage: for n in Primes() do ... od;
Returns:
posint
Synopsis: This is an iterator which returns all the prime numbers in
increasing order.
See also: ?iterate ?iterator ?objectorientation
Protein
Class Protein - data structure to hold SearchMassDb data
Template: Protein(ApproxMass,DigestionWeights())
Protein(ApproxMass,DigestionMono())
Fields:
Name Type Description
-----------------------------------------------------------------------
ApproxMass structure approximate mass in Daltons
DigestionWeights structure weights obtained from using the digestor
DigestionMono structure as above but using monoisoptopic masses
Returns:
Protein
Methods: Protein_type Rand
Synopsis: Protein is a data structure that holds the approximate mass in an
ApproxMass data structure and the digestion weights in either a
DigestionWeights or a DigestionMono data structure. It is used as input to
the SearchMassDb function.
Examples:
> Protein(ApproxMass(65800),DigestionWeights('Trypsin',601.9438, 504.0904, 1512.4545, 480, 590, 700, 998));
Protein(ApproxMass(65800),DigestionWeights(Trypsin,601.9438,504.0904,1512.4545,480,590,700,998))
See Also:
?DigestAspN ?DigestWeights ?MassProfileResults
?DigestionWeights ?DynProgMass ?ProbBallsBoxes
?DigestSeq ?DynProgMassDb ?ProbCloseMatches
?DigestTrypsin ?enzymes ?SearchMassDb
Sequences
Iterator Sequences - iterates over all entries in a database
Usage: for z in Sequences() do ... od;
Returns:
Sequence
Synopsis: This is an iterator which returns all the sequences from the
default database (stored in DB). The sequences are returned in order, ie.
Sequence(Entry(1)), Sequence(Entry(2)), etc.
See Also:
?AC ?Entry ?GetEntryNumber ?iterator ?Sequence
?Entries ?GetEntryInfo ?ID ?PatEntry
database
Class database - Peptide or Nucleotide database
Template: ReadDb(dbname)
Returns:
database
Fields:
Name Type Description
----------------------------------------------------------------------------
Entry,i string the offset into the database of the ith entry.
For programming convenience, the offset of the beyond
last entry is defined as DB[TotChars]
FileName string name of the external file containing the database
Pat,i integer the ith entry of the Pat index on the data, an
integer offset
string string the entire database as a string
TotAA posint number of amino acids or bases in the database
TotChars posint number of characters in the database
TotEntries posint number of entries in the database
type string dna, rna, mixed or peptide
Methods: database_type
Synopsis: A database (DNA, RNA, mixed or peptide) is loaded with the command
ReadDb. The database needs to be loaded for most operations involving
sequences and alignments. The database is always available in the global
variable DB. A database can be assigned to any other name, but certain
operations, like finding an Entry, or using the Pat index, will perform on
the database which is assigned to the global variable DB. All the selectors
are read-only, they cannot be modified.
The database consists of an SGML-formatted file which contains the information
about entries and sequences. For a file to be successfully loaded as a
database, there have to be entries (tagged between and ). Within
each entry there should be a sequence (tagged between and ) of
peptides, DNA or RNA. The first time that a database is loaded, two index
files are constructed. One contains the Pat index and it is stored under
the name dbname.tree and the other is a quick reference for entries and is
stored under the name dbname.map. If the database under dbname is changed,
these two files (dbname.tree and dbname.map) should be removed to force
ReadDb to rebuild them.
The Pat index maintains a total order among all the subsequences of the SEQ
fields of the entries. There are as many entries in the Pat index as amino
acids (or bases) in the entire database. If a Pat index is not desired,
creating a null dbname.tree file will prevent ReadDb of building a Pat
index.
Examples:
> DB := ReadDb('/home/darwin/DB/SwissProt.Z'):;
Peptide file(/home/darwin/DB/SP45.0/SwissProt45.0(169638448), 163235
entries, 59631787 aminoacids)
See Also:
?AC ?GetOffset ?ReadDb ?SearchSeqDb
?ConsistentGenome ?ID ?SearchDb ?SearchTag
?Entry ?Offset ?SearchFrag ?Sequence
?GenomeSummary ?PatEntry ?SearchID
lasterror
System variable lasterror
Synopsis: Contains the last error message generated by Darwin during the
current session
See also: ?error ?traperror
libname
System variable libname
Synopsis: The libname system variable stores the path of the Darwin library.
It is set by the -l flag when executing Darwin from the command line.
list
Class list - list or array of arbitrary elements
Template: []
[a]
[a,...]
Fields:
Name Type Description
---------------------------------------------------------------------
a anything
the ith element in the list
.. a sublist of elements from the list
Returns:
list
Methods: HTMLC list_type power Rand Row Table
Synopsis: A list holds arbitrary values or structures. Elements in the list
are left in the order the list was created. A list is also an array. A list
of lists (of the same length) is a matrix. Elements of a list can be
replaced with an assignment statement. Arithmetic operations work on lists
(arrays) and lists of lists (matrices) according to the normal rules of
linear algebra. (See examples) As an array, the list has no interpretation
of column or row. It will act as column or row depending on the operation
performed on it. When selecting with an integer range, negative values are
interpreted as counting from the right. I.e. -2..-1 select the last two
elements of the list.
Examples:
> a := [b,1,2,2];
a := [b, 1, 2, 2]
> a[1];
b
> a[1..2];
[b, 1]
> a[-1..-1];
[2]
> a[-2..-1];
[2, 2]
> a[3] := 77;
a[3] := 77
> a;
[b, 1, 77, 2]
> A := [[1,2],[3,0]];
A := [[1, 2], [3, 0]]
> V := [-2,3];
V := [-2, 3]
> A*V;
[4, -6]
> V*A;
[7, -4]
> 2*A;
[[2, 4], [6, 0]]
> A/3;
[[0.3333, 0.6667], [1, 0]]
> 7*V;
[-14, 21]
> V/5;
[-0.4000, 0.6000]
> V*V;
13
> B := 1/A;
B := [[0, 0.3333], [0.5000, -0.1667]]
> A*B;
[[1, 0], [0, 1]]
> V+[0,1];
[-2, 4]
See also: ?append ?CreateArray ?matrix ?member ?mselect ?set ?subset
matrix
Class matrix - a matrix of elements
Template: CreateArray(1..m,1..n)
[[...], [...], ...]
Returns:
matrix
Methods: inverse matrix_type print Rand
Synopsis: A matrix in darwin is a list of lists where all the internal lists
have the same length. A matrix can be created with CreateArray, explicitly
as a list of lists, with append or iteratively. Algebra between matrix and
scalars or between matrix and vectors follows the normal rules of Linear
Algebra. A matrix multiplied by a vector on the right assumes the vector is
a column vector. A matrix multiplied by a vector on the left assumes the
vector is a row vector.
Examples:
> [[1,2],[2,3]];
[[1, 2], [2, 3]]
> CreateArray(1..3,1..4,777);
[[777, 777, 777, 777], [777, 777, 777, 777], [777, 777, 777, 777]]
See Also:
?Cholesky ?Eigenvalues ?LinearProgramming ?SvdAnalysis
?convolve ?GaussElim ?LinearRegression ?SvdBestBasis
?CovarianceAnalysis ?GivensElim ?list ?transpose
?CreateArray ?Identity ?matrix_inverse
set
Class set - (mathematical) set of arbitrary elements
Template: {}
{a}
{a,...}
Returns:
set
Fields:
Name Type
---------------------------------------------------------
the ith element in the set
the ith element from the right
.. an expseq of elements from the set
Methods: power Rand set_type
Synopsis: A set holds a set of arbitrary values or structures. Elements in
the set are ordered according to a unique order, and repeated elements are
removed. Elements of a set (when the user is sure where they are located),
can be replaced with an assignment statement. When selecting with an
integer range, negative values are interpreted as counting from the right.
I.e. -2..-1 select the last two elements of the set. The sorting of sets is
very efficient, so if order is desired, placing the information in sets may
be more efficient.
Examples:
> a := {b,1,2,[d,e]};
a := {1,2,b,[d, e]}
> a[1];
1
> a[1..2];
1, 2
> a[-1..-1];
[d, e]
> a[-2..-1];
b, [d, e]
> a[3] := 77;
a[3] := 77
> a;
{1,2,77,[d, e]}
See Also:
?append ?list ?minus ?sort ?union
?intersect ?member ?mselect ?subset
amino acidspeptides
Amino acids, ordinal numbers, one letter codes, 3 letter codes,
molecular weight and name
1 A Ala 89.079 Alanine
2 R Arg 174.188 Arginine
3 N Asn 132.104 Asparagine
4 D Asp 133.089 Aspartic acid
5 C Cys 121.144 Cysteine
6 Q Gln 146.131 Glutamine
7 E Glu 147.116 Glutamic acid
8 G Gly 75.052 Glycine
9 H His 155.142 Histidine
10 I Ile 131.160 Isoleucine
11 L Leu 131.160 Leucine
12 K Lys 146.174 Lysine
13 M Met 149.198 Methionine
14 F Phe 165.177 Phenylalanine
15 P Pro 115.117 Proline
16 S Ser 105.078 Serine
17 T Thr 119.105 Threonine
18 W Trp 204.213 Tryptophan
19 Y Tyr 181.170 Tyrosine
20 V Val 117.113 Valine
See Also:
?AAAToInt ?BaseToInt ?CIntToCodon ?CodonToInt ?IntToBase
?aminoacids ?BBBToInt ?CIntToInt ?GeneticCode ?IntToBBB
?AminoToInt ?BToInt ?CodonCode ?IntToA ?IntToCInt
?AToCInt ?CIntToA ?CodonToA ?IntToAAA ?IntToCodon
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToAmino
?AToInt ?CIntToAmino ?CodonToInt ?IntToB
basesnucleotides
DNA/RNA bases, ordinal numbers, one letter codes, 3 letter codes
and name
1 A Ade Adenine
2 C Cyt Cytosine
3 G Gua Guanine
4 T Thy Thymine
5 U Ura Uracil
See Also:
?AltGenCode ?BaseToInt ?CIntToAmino ?CodonToInt ?IntToBBB
?AminoToInt ?BBBToInt ?CIntToCodon ?Complement ?IntToCInt
?antiparallel ?BToInt ?CIntToInt ?GeneticCode ?IntToCodon
?AToCInt ?CIntToA ?CodonToA ?IntToB ?Reverse
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase
conversiontranslation
Amino acid and genetic code conversion functions
Amino acid translation functions
------------------------------------------------------------------------------------------------------
| To |
| From 1-letter AA 3-letter AA full name AA AA indx 1-20 3-letter cod cod indx 1-64 |
|----------------------------------------------------------------------------------------------------|
| 1-letter AA --- AToInt AToCodon |
| 3-letter AA --- AAAToInt |
| full name AA --- AminoToInt |
| AA indx 1-20 IntToA IntToAAA IntToAmino --- IntToCodon |
| 3-letter cod CodonToA CodonToInt --- CodonToCInt |
| cod indx 1-64 CIntToA CIntToAAA CIntToAmino CIntToInt CIntToCodon --- |
------------------------------------------------------------------------------------------------------
See Also:
?AAAToInt ?BaseToInt ?CIntToCodon ?CodonToInt ?IntToBase
?aminoacids ?BBBToInt ?CIntToInt ?GeneticCode ?IntToBBB
?AminoToInt ?BToInt ?CodonCode ?IntToA ?IntToCInt
?AToCInt ?CIntToA ?CodonToA ?IntToAAA ?IntToCodon
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToAmino
?AToInt ?CIntToAmino ?CodonToInt ?IntToB
CodonCode
Codon, Codon number, one letter aa code, integer aa representation
AAA 1 K 12 AAC 2 N 3 AAG 3 K 12 AAT 4 N 3
ACA 5 T 17 ACC 6 T 17 ACG 7 T 17 ACT 8 T 17
AGA 9 R 2 AGC 10 S 16 AGG 11 R 2 AGT 12 S 16
ATA 13 I 10 ATC 14 I 10 ATG 15 M 13 ATT 16 I 10
CAA 17 Q 6 CAC 18 H 9 CAG 19 Q 6 CAT 20 H 9
CCA 21 P 15 CCC 22 P 15 CCG 23 P 15 CCT 24 P 15
CGA 25 R 2 CGC 26 R 2 CGG 27 R 2 CGT 28 R 2
CTA 29 L 11 CTC 30 L 11 CTG 31 L 11 CTT 32 L 11
GAA 33 E 7 GAC 34 D 4 GAG 35 E 7 GAT 36 D 4
GCA 37 A 1 GCC 38 A 1 GCG 39 A 1 GCT 40 A 1
GGA 41 G 8 GGC 42 G 8 GGG 43 G 8 GGT 44 G 8
GTA 45 V 20 GTC 46 V 20 GTG 47 V 20 GTT 48 V 20
TAA 49 $ 22 TAC 50 Y 19 TAG 51 $ 22 TAT 52 Y 19
TCA 53 S 16 TCC 54 S 16 TCG 55 S 16 TCT 56 S 16
TGA 57 $ 22 TGC 58 C 5 TGG 59 W 18 TGT 60 C 5
TTA 61 L 11 TTC 62 F 14 TTG 63 L 11 TTT 64 F 14
See Also:
?AltGenCode ?BaseToInt ?CIntToAmino ?CodonToInt ?IntToBBB
?AminoToInt ?BBBToInt ?CIntToCodon ?Complement ?IntToCInt
?antiparallel ?BToInt ?CIntToInt ?GeneticCode ?IntToCodon
?AToCInt ?CIntToA ?CodonToA ?IntToB ?Reverse
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase
genetic code
GGG G Gly AGG R Arg CGG R Arg UGG W Trp
GGA G Gly AGA R Arg CGA R Arg UGA Stop
GGC G Gly AGC S Ser CGC R Arg UGC C Cys
GGU G Gly AGU S Ser CGU R Arg UGU C Cys
GAG E Glu AAG K Lys CAG Q Gln UAG Stop
GAA E Glu AAA K Lys CAA Q Gln UAA Stop
GAC D Asp AAC N Asn CAC H His UAC Y Tyr
GAU D Asp AAU N Asn CAU H His UAU Y Tyr
GCG A Ala ACG T Thr CCG P Pro UCG S Ser
GCA A Ala ACA T Thr CCA P Pro UCA S Ser
GCC A Ala ACC T Thr CCC P Pro UCC S Ser
GCU A Ala ACU T Thr CCU P Pro UCU S Ser
GUG V Val AUG M Met CUG L Leu UUG L Leu
GUA V Val AUA I Ile CUA L Leu UUA L Leu
GUC V Val AUC I Ile CUC L Leu UUC F Phe
GUU V Val AUU I Ile CUU L Leu UUU F Phe
See Also:
?AltGenCode ?BaseToInt ?CIntToAmino ?CodonToInt ?IntToBBB
?AminoToInt ?BBBToInt ?CIntToCodon ?Complement ?IntToCInt
?antiparallel ?BToInt ?CIntToInt ?GeneticCode ?IntToCodon
?AToCInt ?CIntToA ?CodonToA ?IntToB ?Reverse
?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase
enzymes enzyme digestor digester
For SearchMassDb the following enzymes are recognized (courtesy
of Amos Bairoch):
Enzyme name cuts between except for
########### ############ ##########
Armillaria Xaa-Cys,Xaa-Lys
ArmillariaMellea Xaa-Lys
BNPS_NCS Trp-Xaa
Chymotrypsin Trp-Xaa,Phe-Xaa,Tyr-Xaa, Trp-Pro,Phe-Pro,Tyr-Pro,
Met-Xaa,Leu-Xaa, Met-Pro,Leu-Pro
Clostripain Arg-Xaa
CNBr_Cys Met-Xaa,Xaa-Cys
CNBr Met-Xaa
AspN Xaa-Asp
LysC Lys-Xaa
Hydroxylamine Asn-Gly
MildAcidHydrolysis Asp-Pro
NBS_long Trp-Xaa,Tyr-Xaa,His-Xaa
NBS_short Trp-Xaa,Tyr-Xaa
NTCB Xaa-Cys
PancreaticElastase Ala-Xaa,Gly-Xaa,Ser-Xaa,Val-Xaa
PapayaProteinaseIV Gly-Xaa
PostProline Pro-Xaa Pro-Pro
Thermolysin Xaa-Leu,Xaa-Ile,Xaa-Met,
Xaa-Phe,Xaa-Trp,Xaa-Val
TrypsinArgBlocked Lys-Xaa Lys-Pro
TrypsinCysModified Arg-Xaa,Lys-Xaa,Cys-Xaa Arg-Pro,Lys-Pro,Cys-Pro
TrypsinLysBlocked Arg-Xaa Arg-Pro
Trypsin Arg-Xaa,Lys-Xaa Lys-Pro
V8AmmoniumAcetate Glu-Xaa Glu-Pro
V8PhosphateBuffer Asp-Xaa,Glu-Xaa Asp-Pro,Glu-Pro
The following are double digestors (both acting simultaneously)
CNBrTrypsin Met-Xaa
Arg-Xaa,Lys-Xaa Lys-Pro
CNBrAspN Met-Xaa
Xaa-Asp
CNBrLysC Met-Xaa
Lys-Xaa
CNBrV8AmmoniumAcetate Met-Xaa
Glu-Xaa Glu-Pro
CNBrV8PhosphateBuffer Met-Xaa
Asp-Xaa,Glu-Xaa Asp-Pro,Glu-Pro
Comments:
CNBr_Cys - its chemistry is not well defined so modifications of
other amino acids may occur.
NBS_log
NBS_short
NTCB
BNPS_NCS - these four digesters produce unpredictable chemical modifications
of other residues which will adversely affect the search.
Hydroxylamine
MildAcidHydrolysis - both of these produce at most one or two fragments per
protein and are therefore not useful for searching.
Chymotrypsin
PancreaticElastase
Thermolysin - are not as specific (or go to completion) as it would be
desired.
PapayaProteinaseIV
PostProline - these enzymes can only cleave small proteins, and hence are
not of great practical use.
CNBr - instead of methionine being left at the C-terminal, a homoserine
(101.1054) or homoserine lactone (83.092) is produced.
TrypsinCysModified - all the cysteines are transformed into aminoethyl-
cysteine (146.2133).
input/output input output io i/o
Input/output is done in Darwin through function calls.
The open commands cause no immediate input/output, they are expected
to be followed by read or write commands.
The open commands accept the name 'terminal', meaning the standard
interactive input and output (stdin/stdout) of Darwin.
File input/output
dprint - print a general expression so that it can be read back
lprint - print a general expression
OpenAppending - all future output will be appended to file
OpenPipe - all future ReadRawLine commands will read from pipe
OpenReading - all future ReadRawLine commands will read from file
OpenWriting - all future output will go to file
print - pretty print expressions
printf - print according to format
ReadRawFile - read an entire file as a string
ReadRawLine - read a line as a string
Darwin commands input/output
OpenPipe - all future ReadLine commands will read from pipe
OpenReading - all future ReadLine commands will read from file
ReadLibrary - read a file/function from the darwin library
ReadLine - reads a darwin command in a single line
ReadOffsetLine - reads a darwin command from a file (offseted)
ReadProgram - read an entire file of darwin commands
WriteTree
Databases input/output
Protein/Nucleotide
ReadDb
ReadBrk
ReadDomain
ReadDssp
ReadFasta
ReadMap
ReadMsa
ReadPima
ReadPir
WriteDomainDB
WriteFasta
Grid files
AddGrid
CloseGrid
CompressGrid
CreateGrid
FlushGrid
GetNextGrid
MapGrid
OpenGrid
QueryGrid
UncompressGrid
NDBM
StoreKey
FindKey
Plotting output
plotoutput
DrawGraph
DrawHistogram
DrawDistribution
DrawDotplot
SmoothData
DrawStackedBar
ViewPlot
StartOverlayPlot
StopOverlayPlot
GetColorMap
DrawTree
DrawTCount
DrawBisectTree
DrawUnrootedTree
DrawSimPam
System commands
CallSystem
TimedCallSystem
date
time
rtime
selectorfunction
A selector function is a function which allows the user/programmer to
define the rules and names for selection. For a data structure D,
(whether this is internal, as defined by type, or user defined), if
D_select is assigned a function, then for every selector which is not
a positive integer or an integer range, the function D_select will
be called to do the selection (or assignment to a selected part).
Example:
Let Imaginary be a user-defined data structure with two parts,
the real part and the imaginary part. A selector which implements
the common names Re and Im can be written as follows:
Imaginary_select := proc( c:Imaginary, sel, val:numeric )
if sel = Im then
if nargs=3 then c[2] := val else c[2] fi
elif sel = Re then
if nargs=3 then c[1] := val else c[1] fi
else error(sel,'is an invalid selector for an Imaginary number') fi
end:
a := Imaginary(1.0,-1.0);
a[Im];
a[Re] := 0; a;
Here we assume that the definition of Imaginary is
Imaginary := proc( realpart:numeric, imagpart:numeric ) ... end:
Notice that when the selector function is called with two arguments,
it is an indication that a value is to be selected. When it is called
with 3 arguments, it is an indication that an assignement should be
made.
printlevel
debugginginformation
printlevel - controlling the amount of printed output from Darwin.
Normally, the result of every statement executed at the top level is printed.
This printing is controlled by a global variable named printlevel. By default
this variable is assigned 1. At this level, expressions or assignments at the
top level and nested one level will be printed. E.g.
This is an assignment at the top level, it is printed.
> a := 1;
a := 1
This is an expression nested one level, it is printed.
> if a=1 then 7 fi;
7
This is an assignment nested two levels deep, nothing is printed.
> for i to 2 do if a=1 then c := 1 fi od;
By increasing printlevel by 1, the printing will happen one level deeper. E.g.
> printlevel := 2;
printlevel := 2
> for i to 2 do if a=1 then c := 1 fi od;
c := 1
c := 1
By increasing printlevel to 5, the execution of any function called at the top
level will be printed. This becomes a very valuable tool for debugging and
inspecting Darwin functions. E.g.
> f := x -> x+1;
f := x -> x+1
> printlevel := 5;
printlevel := 5
> f(1);
{--> enter f, args = 1
2
<-- exit f = 2}
2
By increasing printlevel to 10 the statements in a nested function call will
be displayed. E.g.
> g := x->f(x):
> printlevel := 10:
> g(1);
{--> enter g, args = 1
{--> enter f, args = 1
2
<-- exit f = 2}
2
<-- exit g = 2}
2
Some additional printing is also controlled by printlevel. If printlevel is
higher than 2, then in case of an error, a complete traceback is printed with
all the local variables, parameters and their values. Many functions use
printlevel to print additional information about the problem they are solving.
Users are encouraged to use printlevel for this purpose. In this case, a value
of 1 should not print anything, and values greater than 4 are not recommended,
since the user will be forced to see the trace of top level functionc calls.
E.g.
my_function := proc( x )
. . . .
if printlevel > 2 then printf( 'hyperbolic cut method used\n' ) fi;
. . . .
end:
Notice that if you want to modify printlevel inside a function, you should
declare it in the global list, else by default it becomes a local variable.
See also: ?debug ?trace
profileprofiling
callgraphcalltreescallingtree
Profiling - Measuring how efficiently a program is executing. Darwin
provides tools for profiling the execution of a program and then analyzing
these results. The profiling is done at the darwin-level functions, kernel
functions normally cannot be profiled.
The procedure is as follows. The program or session to be profiled is ran
with the addition of the option profile. This option is set by the command
Set(profile); Darwin will then produce addtional output, which is sent to the
standard output, consisting of one short line per every entry and exit to a
darwin function. This information can be analyzed by three external programs:
profile (which provides a basic profile per function), callgraph (which
provides basic profile per caller-callee function) and calltrees (which analyzes
the most resource consuming complete call trees).
Lets assume that we want to profile the following LongInteger computation:
> LLL( [[1,0,LongInteger(1000000000)],
> [0,1,LongInteger(3141592654)]] );
[[LongInteger([-355]), LongInteger([113]), LongInteger([-30098])],
[LongInteger([-104348]), LongInteger([33215]), LongInteger([2610])]]
When this is run with option profile, the first few lines are:
> Set(profile):
> LLL( [[1,0,LongInteger(1000000000)],
> [0,1,LongInteger(3141592654)]] );
->LongInteger 21,81182,200
->LongInteger_normal 50,81204,200
<-LongInteger_normal 50,81291,200
<-LongInteger 21,81296,200
. . . . . .
The output contains the name of the function call, the recursion level, the
number of words allocated and the number of clock ticks. All profilers report
the time consumed and the storage requested. Ordering of the output is done
based on time*space^2, a reasonable way of scoring the composite time/space
resources. If this output is stored in a file, it can be later analyzed with
the programs profile, callgraph and calltrees. In a Unix system, the output
from Darwin can be piped into this programs directly, and the need for
intermediate files, which may be very large, is avoided. The first few lines
of the result of profile is:
12 different functions, using 1.017 secs and 107K words
name #calls cpu words
==== ====== === =====
Main_Routine 1 0.667 ( 65.6%) 81244 ( 76.1%)
LongInteger_times 78 0.100 ( 9.8%) 6426 ( 6.0%)
LongInteger_normal 133 0.033 ( 3.3%) 6759 ( 6.3%)
LongInteger 132 0.067 ( 6.6%) 3700 ( 3.5%)
LongInteger_iquo 10 0.033 ( 3.3%) 4013 ( 3.8%)
. . . .
where the interpretation is obvious. The same output analyzed with callgraph
produces:
used by Callee
Caller Callee #calls time words
====== ====== ====== ==== =====
#++ calls Main_Routine 1, 1.017, 106762
#++ Main_Routine calls LLL 1, 0.350, 25290
#++ LLL calls LongInteger_times 63, 0.133, 8750
#++ LongInteger_times calls LongInteger 64, 0.067, 7276
#++ LLL calls LongInteger_power 12, 0.067, 5523
#++ LLL calls LongInteger_iquo 10, 0.050, 5029
. . . . .
In this case, for example, the function LLL calls the function
LongInteger_times 63 times, and it and all its descendants consume 0.133 secs
and allocate 8750 words. Main_Routine stands for all the commands executed at
the top level. is a fictitious level, above the top level, used to be
able to report the entire session. Finally the output of calltrees is:
Calling sequence used by Callee
time Kwords
LLL uses
0.35 25.29
LLL calls
LongInteger_iquo uses
0.02 0.65
. . . .
LLL calls
LongInteger_power calls
LongInteger_times uses
0.02 0.49
. . .
In this case, the complete calling trees and their resource consumption are
shown in decreasing order of resources. This is information is useful when
a significant amount of resources is consumed in a single calling path.
The user has additional control, and can identify any block of code to profile
it and incorporate it to the rest of the profiling computations. This is done
with the commands EnterProfile and ExitProfile. For example, the following
function identifies a for-loop as BigLoop:
f := proc(n)
s1 := sum(1/i,i=1..n);
EnterProfile(BigLoop);
s2 := 0; for i to n do s2 := s2+1/i od;
ExitProfile(BigLoop);
[s1,s2,s1-s2]
end:
Set(profile): f(10^5);
which when profiled with callgraph produces:
used by Callee
Caller Callee #calls time words
====== ====== ====== ==== =====
#++ calls Main_Routine 1, 4.017, 467923
#++ Main_Routine calls f 1, 3.600, 400022
#++ f calls BigLoop 1, 2.983, 399996
#++ BigLoop calls gc 1, 0.067, 0
See Also: ?EnterProfile ?ExitProfile ?printlevel ?debug
commandcommandline
libnameexecuterundarwin
Darwin is a program which can be executed interactively or
in batch. In all cases it will read commands, execute them and
write the output results.
In Unix or Linux, the command name is darwin. Besides redirection,
the command accepts the following options:
darwin [-q] [-s] [-U] [-l lib_dir] [-i input_file] [-o output_file]
-q - quiet option. Do not echo input statements
(in batch mode) do not print garbage collection
messages and final resources used message.
This option can be changed with the Set command.
-s - server option. Work as a server, this means
that the system attempts to be immune to hostile
programs that may be executing. It will not
execute system commands, read or write files
(except for reading from the library), use
grid files, use tcp commands, use CallExternal
open pipes.
This option can be set with the Set command.
-B - batch option. Work in batch mode, this means
the system will exit when it encounters end of
input or CTRL-C.
This option can be changed with the Set command.
-E - errorexit option. In this mode the system will
exit with a nonzero status when it encounters an
untrapped error.
This option can be changed with the Set command.
-U - Unbuffered option. The standard output, when
redirected to a file, is not buffered. So any
output will be stored in the file immediately.
This is very useful for debugging to see the
very last actions of the system in case of a
crash.
-l - Use the given directory as root for the darwin
library. This value defaults to "lib". The
global variable "libname" is set to this value.
Darwin will always use the value in "libname"
to load library functions.
-S - Use the given file as initialization script
instead of /darwinit.
-i - Use the given file as standard input. This
replaces the standard redirection available in
Unix.
-o - Use the given file as standard output. This
replaces the standard redirection available in
Unix.
returntype return
returntyping returnvalue
Procedure declarations allow the definition of return-type after the parameter
declarations. This definition is done with the -> operator and terminated by
a colon or a semi-colon E. g. :
my_function := proc () -> numeric;
42.1;
end;
If the function is written incorrectly i.e. it returns a string when a numeric
is declared as the return type it will give an error E. g.
my_function := proc () -> numeric;
'hello';
end;
> my_function();
'my_function should return numeric, returned: hello
Error, (in my_function) invalid return value'
Programs can test for the return type, by selecting the 5th component of a
procedure body. For example, the expression op(5,op(my_function)); entered at
the command prompt will return "numeric" when my_function is defined as above.
Return types are used by the Inherit function to determine what data type should
be returned by methods that are inherited from another class.
See also: ?proc ?option ?Inherit
hydrophobicity Fauchere FreeEnergy ChouFasman AtomicVolume
Function hydrophobicity - define various measures of hydrophobicity and atomic volume
Calling Sequence: No call needed as soon as library hydrophobicity is available
Parameters:
Returns:
NULL
Synopsis: This function assigns the global variables FauchereHydrophobicity,
FreeEnergyHydrophobicity, ChouFasman and AtomicVolume. Each of those variables
are assigned a vector of length 20. Each element in these vectors contain the
respective value of the chemical property that variable name is referring to.
Indexing of the amino acids is done according to AAAToInt. The following
values are used as chemical properties:
Amino acid Fauchere Free Energy Chou Fasman Atomic Volume
-------------------------------------------------------------
Arg -1.01 19.92 1.04 225
Lys -0.99 -9.52 0.98 171
Asp -0.77 -10.95 1.20 125
Glu -0.64 -10.20 0.86 155
Asn -0.60 -9.68 1.35 135
Gln -0.22 -9.38 0.86 161
Ser -.004 -5.06 1.32 99
Gly 0.00 2.39 1.50 66
His 0.13 -10.27 1.06 167
Thr 0.26 -4.88 1.07 122
Ala 0.31 1.94 0.70 92
Pro 0.72 0.00 1.59 129
Tyr 0.96 -6.11 1.06 203
Val 1.20 1.99 0.62 142
Met 1.23 -1.48 0.58 171
Cys 1.54 -1.24 1.18 106
Leu 1.70 2.28 0.68 168
Phe 1.79 -0.76 0.71 203
Ile 1.80 2.15 0.66 169
Trp 2.25 -5.88 0.75 240
Examples:
See also:
AAAP The darwin 1.6 function AAAP has been renamed to AAAToInt in Darwin v2.2.
ACS The darwin 1.6 function ACS has been renamed to AC in Darwin v2.2.
AP The darwin 1.6 function AP has been renamed to AToInt in Darwin v2.2.
AaCount The darwin 1.6 function AaCount has been renamed to GetAaCount in Darwin v2.2.
AaFrequency The darwin 1.6 function AaFrequency has been renamed to GetAaFrequency in Darwin v2.2.
AddGF The darwin 1.6 function AddGF has been renamed to AddGrid in Darwin v2.2.
AlignGaps The darwin 1.6 function AlignGaps has been renamed to AdjustGaps in Darwin v2.2.
AlignedIntrons The darwin 1.6 function AlignedIntrons has been renamed to GetIntrons in Darwin v2.2.
AlignedPeptide The darwin 1.6 function AlignedPeptide has been renamed to GetPeptides in Darwin v2.2.
AllMatches The darwin 1.6 function AllMatches has been renamed to GetAllMatches in Darwin v2.2.
AminoP The darwin 1.6 function AminoP has been renamed to AminoToInt in Darwin v2.2.
ApprTextSearch The darwin 1.6 function ApprTextSearch has been renamed to ApproxSearchString in Darwin v2.2.
BBBC The darwin 1.6 function BBBC has been renamed to CodonToCInt
BBBP The darwin 1.6 function BBBP has been renamed to BBBToInt in Darwin v2.2.
BP The darwin 1.6 function BP has been renamed to BToInt in Darwin v2.2.
BackDynProgr The darwin 1.6 function BackDynProgr is obsolete, use Align.
BaseP The darwin 1.6 function BaseP has been renamed to BaseToInt in Darwin v2.2.
BestPamShake The darwin 1.6 function BestPamShake is obsolete, use Align.
BestStringMatch The darwin 1.6 function BestStringMatch has been renamed to SearchString in Darwin v2.2.
BisectTree The darwin 1.6 function BisectTree has been renamed to DrawBisectTree in Darwin v2.2.
CBBB The darwin 1.6 function CBBB has been renamed to CIntToCodon
CleanMSA The darwin 1.6 function CleanMSA has been renamed to RemoveGaps in Darwin v2.2.
CT_Species The darwin 1.6 function CT_Species has been renamed to AddSpecies in Darwin v2.2.
CloseGF The darwin 1.6 function CloseGF has been renamed to CloseGrid in Darwin v2.2.
ColorMap The darwin 1.6 function ColorMap has been renamed to GetColorMap in Darwin v2.2.
ColorTree The darwin 1.6 function ColorTree has been renamed to CreateColoredTree in Darwin v2.2.
CompressGF The darwin 1.6 function CompressGF has been renamed to CompressGrid in Darwin v2.2.
ConvertSP The darwin 1.6 function ConvertSP has been renamed to SpToDarwin in Darwin v2.2.
ConvertToDF The darwin 1.6 function ConvertToDF has been renamed to DbToDarwin in Darwin v2.2.
CreateGF The darwin 1.6 function CreateGF has been renamed to CreateGrid in Darwin v2.2.
DF The darwin 1.6 function DF has been renamed to DB in Darwin v2.2.
DMDMS The darwin 1.6 function DMDMS has been renamed to CreateDayMatrices in Darwin v2.2.
DnaFile The darwin 1.6 function DnaFile has been renamed to database in Darwin v2.2.
DNAPepDayhoffM The darwin 1.6 function DNAPepDayhoffM has been renamed to ApproxDnaDayMatrix in Darwin v2.2.
Dayhoff The darwin 1.6 function Dayhoff has been renamed to CreateOrigDayMatrix in Darwin v2.2.
DayhoffM The darwin 1.6 function DayhoffM has been renamed to CreateDayMatrix in Darwin v2.2.
DelFixed The darwin 1.6 function DelFixed has been renamed to FixedDel in Darwin v2.2.
DelIncr The darwin 1.6 function DelIncr has been renamed to IncDel in Darwin v2.2.
DigestSeqs The darwin 1.6 function DigestSeqs has been renamed to DigestSeq in Darwin v2.2.
Distribution The darwin 1.6 function Distribution has been renamed to DrawDistribution in Darwin v2.2.
DotPlot The darwin 1.6 function DotPlot has been renamed to DarwDotplot in Darwin v2.2.
DynProgr The darwin 1.6 function DynProgr has been renamed to DynProgScore in Darwin v2.2.
ERROR The darwin 1.6 function ERROR has been renamed to error in Darwin v2.2.
EndOverlayPlot The darwin 1.6 function EndOverlayPlot has been renamed to StopOverlayPlot in Darwin v2.2.
Entries The darwin 1.6 function Entries has been renamed to Entry in Darwin v2.2.
Entropy The darwin 1.6 function Entropy has been renamed to FindEntropy in Darwin v2.2.
EntryInfo The darwin 1.6 function EntryInfo has been renamed to GetEntryInfo in Darwin v2.2.
EntryNumber The darwin 1.6 function EntryNumber has been renamed to GetEntryNumber in Darwin v2.2.
EqradTree The darwin 1.6 function EqradTree has been renamed to DrawBisectTree in Darwin v2.2.
ExponFit The darwin 1.6 function ExponFit has been renamed to ExpFit in Darwin v2.2.
ExponFit2 The darwin 1.6 function ExponFit2 has been renamed to ExpFit2 in Darwin v2.2.
ExtCallFrame The darwin 1.6 function ExtCallFrame has been renamed to CreateCProgram in Darwin v2.2.
FlushGF The darwin 1.6 function FlushGF has been renamed to FlushGrid in Darwin v2.2.
FragSearch The darwin 1.6 function FragSearch has been renamed to SearchFrag in Darwin v2.2.
GetBetween The darwin 1.6 function GetBetween has been renamed to GetLcaSubtree in Darwin v2.2.
GetIndex The darwin 1.6 function GetIndex has been renamed to FindTreeFitIndex in Darwin v2.2.
GetLabels The darwin 1.6 function GetLabels has been renamed to GetTreeLabels in Darwin v2.2.
GetTreeLength The darwin 1.6 function GetTreeLength has been renamed to TotalTreeWeight in Darwin v2.2.
GetPath The darwin 1.6 function GetPath has been renamed to GetPathDistance in Darwin v2.2.
Histogram The darwin 1.6 function Histogram has been renamed to DrawHistogram in Darwin v2.2.
IDS The darwin 1.6 function IDS has been renamed to ID in Darwin v2.2.
IPCconnect The darwin 1.6 function IPCconnect has been renamed to ConnectTcp in Darwin v2.2.
IPCdisconnect The darwin 1.6 function IPCdisconnect has been renamed to DisconnectTcp in Darwin v2.2.
IPCread The darwin 1.6 function IPCread has been renamed to ReadTcp in Darwin v2.2.
IPCreceive The darwin 1.6 function IPCreceive has been renamed to ReceiveTcp in Darwin v2.2.
IPCsend The darwin 1.6 function IPCsend has been renamed to SendTcp in Darwin v2.2.
IPCreceiveDATA The darwin 1.6 function IPCreceiveDATA has been renamed to ReceiveDataTcp in Darwin v2.2.
IPCsendDATA The darwin 1.6 function IPCsendDATA has been renamed to SendDataTcp in Darwin v2.2.
LabelTree The darwin 1.6 function LabelTree has been renamed to ChangeLeafLabels in Darwin v2.2.
LarsonTree The darwin 1.6 function LarsonTree has been renamed to DrawUnrootedTree in Darwin v2.2.
LinRegr The darwin 1.6 function LinRegr has been renamed to LinearRegression in Darwin v2.2.
LoadFile The darwin 1.6 function LoadFile has been renamed to ReadDb in Darwin v2.2.
LongestRep The darwin 1.6 function LongestRep has been renamed to FindLongestRep in Darwin v2.2.
MultiAlign The darwin 2.2 function MultiAlign has been renamed to MAlignment in Darwin v3.0.
MachineUsage The darwin 1.6 function MachineUsage has been renamed to GetMachineUsage in Darwin v2.2.
MapGF The darwin 1.6 function MapGF has been renamed to MapGrid in Darwin v2.2.
MassDyn The darwin 1.6 function MassDyn has been renamed to DynProgMass in Darwin v2.2.
MassDynAll The darwin 1.6 function MassDynAll has been renamed to DynProgMassDb in Darwin v2.2.
MassProfile The darwin 1.6 function MassProfile has been renamed to SearchMassDb in Darwin v2.2.
Maximize The darwin 1.6 function Maximize has been renamed to MaximizeFunc in Darwin v2.2.
MinSqTree The darwin 1.6 function MinSqTree has been renamed to MinSquareTree in Darwin v2.2.
Minimize The darwin 1.6 function Minimize has been renamed to MinimizeFunc in Darwin v2.2.
Minimize2D The darwin 1.6 function Minimize2D has been renamed to Minimize2DFunc in Darwin v2.2.
Minimizex The darwin 1.6 function Minimizex has been renamed to DisconMinimize in Darwin v2.2.
MolWeight The darwin 1.6 function MolWeight has been renamed to GetMolWeight in Darwin v2.2.
MostFrequent The darwin 1.6 function MostFrequent has been renamed to GetMostFrequentGrams in Darwin v2.2.
MoveGap The darwin 1.6 function MoveGap has been renamed to MoveGap in Darwin v2.2.
MultAlign The darwin 1.6 function MultAlign has been renamed to CreateMultiAlign in Darwin v2.2.
NewArray The darwin 1.6 function NewArray has been renamed to CreateArray in Darwin v2.2.
NewString The darwin 1.6 function NewString has been renamed to CreateString in Darwin v2.2.
NextGF The darwin 1.6 function NextGF has been renamed to GetNextGrid in Darwin v2.2.
NPAlignMatch The darwin 1.6 function NPAlignMatch has been renamed to AlignNucPepMatch in Darwin v2.2.
NPAllMatches The darwin 1.6 function NPAllMaatches has been renamed to GetAllNucPepMatches in Darwin v2.2.
NPBackDynProgr The darwin 1.6 function NPBackDynProgr has been renamed to NucPepBackDynProg in Darwin v2.2.
NPBestPamMatch The darwin 1.6 function NPBestPamMatch has been renamed to FindNucPepPam in Darwin v2.2.
NPBestPamShake The darwin 1.6 function NPBestPamShake has been renamed to LocalNucPepAlignBestPam in Darwin v2.2.
NPDynProgr The darwin 1.6 function NPDynProgr has been renamed to NucPepDynProg in Darwin v2.2.
NPMatch The darwin 1.6 function NPMatch has been renamed to NucPepMatch in Darwin v2.2.
NPMultiAllMatches The darwin 1.6 function NPMultiAllMatches has been renamed to ParallelAllNucPepMatches in Darwin v2.2.
NPOneAllMatch The darwin 1.6 function NPOneAllMatch has been renamed to AlignNucPepAll in Darwin v2.2.
NPRefine The darwin 1.6 function NPRefine has been renamed to GlobalNucPepAlign in Darwin v2.2.
NPRefineShake The darwin 1.6 function NPRefineShake has been renamed to LocalNucPepAlign in Darwin v2.2.
NPRegions The darwin 1.6 function NPRegions has been renamed to NucPepRegions in Darwin v2.2.
NPSprintMatch The darwin 1.6 function NPSprintMatch has been renamed to DynProgNucPepString in Darwin v2.2.
Offsets The darwin 1.6 function Offsets has been renamed to Offset in Darwin v2.2.
OneAllMatch The darwin 1.6 function OneAllMatch has been renamed to AlignOneAll in Darwin v2.2.
OpenGF The darwin 1.6 function OpenGF has been renamed to OpenGrid in Darwin v2.2.
OrderedSearch The darwin 1.6 function OrderedSearch has been renamed to SearchOrderedArray in Darwin v2.2.
PA The darwin 1.6 function PA has been renamed to IntToA in Darwin v2.2.
PAAA The darwin 1.6 function PAAA has been renamed to IntToAAA in Darwin v2.2.
PAmino The darwin 1.6 function PAmino has been renamed to IntToAmino in Darwin v2.2.
PB The darwin 1.6 function PB has been renamed to IntToN in Darwin v2.2.
PBBB The darwin 1.6 function PBBB has been renamed to IntToNuc in Darwin v2.2.
PBase The darwin 1.6 function PBase has been renamed to IntToNucleic in Darwin v2.2.
PItoPam The darwin 1.6 function PItoPam has been renamed to PerIdentToPam in Darwin v2.2.
PamtoPI The darwin 1.6 function PamtoPI has been renamed to PamToPerIdent in Darwin v2.2.
ParExec The darwin 1.6 function ParExec has been renamed to ParExecute in Darwin v2.2.
ParExec2 The darwin 1.6 function ParExec2 has been renamed to ParExecuteIPC in Darwin v2.2.
ParTest The darwin 1.6 function ParTest has been renamed to ParExecuteTest in Darwin v2.2.
PatEntries The darwin 1.6 function PatEntries has been renamed to PatEntry in Darwin v2.2.
PepPepSearch The darwin 1.6 function PepPepSearch is obsolete, use FragSearch.
SearchPepAll The darwin 1.6 function SearchPepAll is obsolete, use FragSearch.
PhyloTree The darwin 1.6 function PhyloTree has been renamed to PhylogeneticTree in Darwin v2.2.
PickTree The darwin 1.6 function PickTree has been renamed to FindLabeledSubtree in Darwin v2.2.
PlotPam The darwin 1.6 function PlotPam has been renamed to DrawSimPam in Darwin v2.2.
PlotOptions The darwin 1.6 function PlotOptions has been renamed to Plot in Darwin v2.2.
PosInfo The darwin 1.6 function PosInfo has been renamed to GetPosition in Darwin v2.2.
PositionDF The darwin 1.6 function PostionDF has been renamed to GetOffset in Darwin v2.2.
PrintSeqsInTree The darwin 1.6 function PrintSeqsInTree has been renamed to PrintTreeSeq in Darwin v2.2.
ProbDynProgr The darwin 1.6 function ProbDynProgr has been renamed to ProbDynProg in Darwin v2.2.
ProfileEnter The darwin 1.6 function ProfileEnter has been renamed to EnterProfile in Darwin v2.2.
ProfileExit The darwin 1.6 function ProfileExit has been renamed to ExitProfile in Darwin v2.2.
QueryAll The darwin 1.6 function QueryAll has been renamed to AllQueryGrid in Darwin v2.2.
QueryGF The darwin 1.6 function QueryGF has been renamed to QueryGrid in Darwin v2.2.
RETURN The darwin 1.6 function RETURN has been renamed to return in Darwin v2.2.
RandTree The darwin 1.6 function RandTree has been renamed to CreateRandMultAlign in Darwin v2.2.
RandomPermut The darwin 1.6 function RandomPermut has been renamed to CreateRandPermutation in Darwin v2.2.
RandomSeq The darwin 1.6 function RandomSeq has been renamed to CreateRandSeq in Darwin v2.2.
RandomTrees The darwin 1.6 function RandomTrees has been renamed to CreateRandTrees in Darwin v2.2.
Refine The darwin 1.6 function Refine is obsolete, use Align.
RefineLog The darwin 1.6 function RefineLog is not implemented in Darwin v3.0.
RefineShake The darwin 1.6 function RefineShake is obsolete, use Align.
SameTree The darwin 1.6 function SameTree has been renamed to IdenticalTrees in Darwin v2.2.
Scale The darwin 1.6 function Scale has been renamed to DayMatrixScale in Darwin v2.2.
SearchDF The darwin 1.6 function SearchDF has been renamed to SearchDb in Darwin v2.2.
SearchText The darwin 1.6 function SearchText has been renamed to CaseSearchString in Darwin v2.2.
Sequences The darwin 1.6 function Sequences has been renamed to Sequence in Darwin v2.2.
ShortestPath The darwin 1.6 function ShortestPath has been renamed to ConShortestPath in Darwin v2.2.
ShortestPath2 The darwin 1.6 function ShortestPath2 has been renamed to ShortestPath in Darwin v2.2.
Smooth The darwin 1.6 function Smooth has been renamed to SmoothData in Darwin v2.2.
SplatTree The darwin 1.6 function SplatTree has been renamed to DrawUnrootedTree in Darwin v2.2.
SplatTree The darwin 2.1 function DrawSplatTree has been renamed to DrawUnrootedTree in Darwin v2.2.
SprintMatch The darwin 1.6 function SprintMatch has been renamed to DynProgStrings in Darwin v2.2.
Ssystem The darwin 1.6 function Ssystem has been renamed to TimedCallSystem in Darwin v2.2.
StackedBar The darwin 1.6 function StackedBar has been renamed to DrawStackedBar in Darwin v2.2.
Stats The darwin 1.6 function Stats has been renamed to Stat in Darwin v2.2.
Strings The darwin 1.6 function Strings has been renamed to string in Darwin v2.2.
SummarizeTree The darwin 1.6 function SummarizeTree has been renamed to CollapseNodes in Darwin v2.2.
TSP The darwin 1.6 function TSP has been renamed to ComputeTSP in Darwin v2.2.
TSP3 The darwin 1.6 function TSP3 has been renamed to ComputeCubicTSP in Darwin v2.2.
TSP4 The darwin 1.6 function TSP4 has been renamed to ComputeQuadraticTSP in Darwin v2.2.
TreeOrder The darwin 1.6 function TreeOrder has been renamed to FindCircularOrder in Darwin v2.2.
TrulyRandom The darwin 1.6 function TrulyRandom has been renamed to SetRandSeed in Darwin v2.2.
UUUP The darwin 1.6 function UUUP has been renamed to CodonToInt in Darwin v2.2.
UnCompressGF The darwin 1.6 function UnCompressGF has been renamed to UncompressGrid in Darwin v2.2.
UnLabelTree The darwin 1.6 function UnLabelTree has been renamed to UnlabelLeaves in Darwin v2.2.
UnionStats The darwin 1.6 function UnionStats has been renamed to UnionStat in Darwin v2.2.
Violations The darwin 1.6 function Violations has been renamed to FindSpeciesViolations in Darwin v2.2.
WriteMSA The darwin 1.6 function WriteMSA has been renamed to WriteMsa in Darwin v2.2.
appendto The darwin 1.6 function appendix has been renamed to AppendFile in Darwin v2.2.
clearw The darwin 1.6 function clearw has been renamed to ClearStat in Darwin v2.2.
currentOfs The darwin 1.6 function currentOfs has been renamed to CurrentOff in Darwin v2.2.
dpuTime The darwin 1.6 function dupTime has been renamed to DpuTime in Darwin v2.2.
eigenvalues The darwin 1.6 function eigenvalues has been renamed to Eigenvalues in Darwin v2.2.
externcall The darwin 1.6 function externcall has been renamed to CallExternal in Darwin v2.2.
findkey The darwin 1.6 function findkey has been renamed to FindKey in Darwin v2.2.
function The darwin 1.6 function function has been renamed to noeval in Darwin v2.2.
gausselim The darwin 1.6 function gausselim has been renamed to GaussElim in Darwin v2.2.
gcm The darwin 1.6 function gcm has been renamed to CodonToA in Darwin v2.2.
kGramRegion The darwin 1.6 function kGramRegion has been renamed to GramRegion in Darwin v2.2.
kGramRegionScore The darwin 1.6 function kGramRegionScore has been renamed to GetGramRegionScore in Darwin v2.2.
kGramSite The darwin 1.6 function kGramSite has been renamed to GramSite in Darwin v2.2.
kGramSiteScore The darwin 1.6 function kGramSiteScore has been renamed to GetGramSiteScore in Darwin v2.2.
load The darwin 1.6 function load has been renamed to ReadLibrary in Darwin v2.2.
numeric The darwin 1.6 function numeric has been renamed to real in Darwin v2.2.
plot The darwin 1.6 function plot has been renamed to DrawPlot in Darwin v2.2.
read The darwin 1.6 function read has been renamed to ReadProgram in Darwin v2.2.
readBRK The darwin 1.6 function readBRK has been renamed to ReadBrk in Darwin v2.2.
readDSSP The darwin 1.6 function readDSSP has been renamed to ReadDssp in Darwin v2.2.
readfile The darwin 1.6 function readfile has been renamed to ReadRawFile in Darwin v2.2.
readpipelines The darwin 1.6 function readpipelines has been renamed to OpenPipe in Darwin v2.2.
readstat The darwin 1.6 function readstat has been renamed to ReadLine in Darwin v2.2.
readstatAt The darwin 1.6 function readstatAt has been renamed to ReadOffsetLine in Darwin v2.2.
searchtext The darwin 1.6 function searchtext has been renamed to SearchString in Darwin v2.2.
specfunc The darwin 1.6 function specfunc has been renamed to specuneval in Darwin v2.2.
srand The darwin 1.6 function srand has been renamed to SetRand in Darwin v2.2.
system The darwin 1.6 function system has been renamed to CallSystem in Darwin v2.2.
text The darwin 1.6 function text has been renamed to string in Darwin v2.2.
update The darwin 1.6 function update has been renamed to UpdateStat in Darwin v2.2.
writeto The darwin 1.6 function writeto has been renamed to WriteFile in Darwin v2.2.
Predict The darwin 1.6 function Predict has been renamed to PredictSecStruct in Darwin v2.2.
NDF The darwin 1.6 function NDF has been renamed to NucDB in Darwin v2.2.
Simil The darwin 1.6 function Simil has been renamed to Sim in Darwin v2.2.
Text The darwin 1.6 function Text has been renamed to string in Darwin v2.2.
PDF The darwin 1.6 function PDF has been renamed to PepDB in Darwin v2.2.
MaxSimil The darwin 1.6 function MaxSimil has been renamed to MaxSim in Darwin v2.2.
MinSimil The darwin 1.6 function MinSimil has been renamed to MinSim in Darwin v2.2.
GetPam The darwin 1.6 function GetPam has been removed, use Align in Darwin v4.0.
Text The darwin 1.6 function Text has been renamed to string in Darwin v2.2.
WriteFile The darwin 2.0 function WriteFile has been renamed to OpenWriting in Darwin v2.2.
AppendFile The darwin 2.0 function AppendFile has been renamed to OpenAppending in Darwin v2.2.
SearchPepDF The darwin 1.6 function SearchPepDF has been renamed to SearchSeqDb in Darwin v2.2.
Scramble The darwin 1.6 function Scramble has been renamed to Shuffle in Darwin v2.2.
AToGenCode The darwin function AToGenCode has been renamed to AToCodon
IntToGenCode IntToGenCode has been renamed to IntToCodon
NucToCode NucToCode has been renamed CodonToCInt
CodeToNuc CodeToNuc has been renamed CIntToCodon
CodonToAAA CodonToAAA has been renamed CIntToAAA
CodonToAmino CodonToAmino has been renamed CIntToAmino
GenCode GenCode has been renamed CodonToA
NToInt NToInt has been renamed BToInt
NucToInt NucToInt has been renamed BBBToInt
NucleicToInt NucleicToInt has been renamed BaseToInt
GenCodeToInt GenCodeToInt has been renamed CodonToInt
classesclassstructuresdata structures
Data structures in Darwin are represented by a name followed by the fields
in parenthesis. For example:
Complex( 1.0, 2.0 )
The data structures, syntactically, are identical to function calls, where
the function name is the data structure name and the arguments of the call are
the fields of the structure. A data structure may have its name defined as a
procedure. In this case, the procedure is normally used to check the validity
of the arguments, to simplify the structure if needed and/or to put it in
normal form. For example:
Complex := proc( realpart:numeric, imagpart:numeric )
if imagpart=0 then realpart else noeval(Complex(args)) fi end:
The noeval returning the value is needed to avoid an infinite recursion on
the name of the data structure; we do not want this final structure to be
evaluated, it has been checked already.
To construct a data structure, the functional syntax is used. To select a
component, selection by an integer will always return the corresponding field.
The particular data structures may have defined special name selectors. These
are handled by the function StructureName_select.
The following are the data structures currently implemented in Darwin. Use
?xxx to find the particulars of the structure xxx.
AlSumm Fold MSAMethod Residue
Alignment Gap MSAStatistics SparsePFA
AllAllResult GapHeuristic Machine Stat
Block GapMatch MySqlResult TaxonomyEntry
Chain Gene NucPepMatch TestStatResult
CoalescentNode GenomeSummary OrthologousGroup Tree
CodonMatrix GramRegion PartialOrder TreeConstruction
Covariance GramSite Partitions TreeResult
DataMatrix Graph Polar TreeStatistics
Dependency History ProbabilisticFA UnionFind
Description IntronModel Process
Edge LeafNode RecombinationNode
EvolTree MAlignment Region
See also: ?select
darwin
Darwin
Darwin (Data Analysis and Retrieval With Indexed Nucleotide/peptide
sequences) is an interactive system for doing Bioinformatics, in particular,
sequence matching and sequence analysis. It is being presently developed at
the E.T.H. in Zurich by the Computational Biochemistry Research Group. The
development of the system and its use to solve real problems goes in parallel;
the more capabilities the system has, the more complicated problems we can
solve, which means more theory and more algorithms we want to implement.
Darwin resembles the Maple symbolic algebra system (Maple Reference Manual,
Char et al., Fifth edition, 1988) more than just superficially. Darwin works
in ``calculator mode''. This means that Darwin will wait for the user to type
in a command, execute the given command, print the answer (if any) and wait
for more input from the user, repeating the above. Darwin indicates it is
waiting for input from the user by printing a ">" character at the beginning
of a line and waiting with the cursor positioned in that same line.
A command to Darwin is called a ``statement'' and is always terminated by a
semi-colon (;) and a carriage return (typically the key labelled ``return'' or
``enter''). Note that until Darwin reads a semi-colon and a carriage return,
it will not consider its input completed and will not do anything with it.
procprocedurefunctionsparameters
A procedure or function in Darwin is defined with the syntactic construct
"proc" ... "end". Functions (returning a value(s)), procedures (functions not
returning any value) and Object constructors (functions which return a data
structure), are defined by the same construct. A "proc" is the main way of
defining procedures, but it is also possible to generate procedures with the
arrow notation ("->") and with the use of high level functions, like Inherit,
(see ?OO). Procs are also the main vehicle to define classes or data
structures.
A procedure has the following components:
proc( param1:type1, param2:type2, ... ) -> ReturnType;
local var1, var2, .... ;
global gvar1, .... ;
option opt1, .... ;
description '....';
. . . . . .
end:
The formal parameters, enclosed by parenthesis right after the "proc" token,
define the arguments which may be passed to the procedure. The actual number
of arguments passed to the procedure in a call may differ from the number
specified in the proc. The following rules apply:
(1) The formal parameters have an optional type specification.
(2) All the parameters which have a type, if they are present when calling the
procedure, they will by typed-checked. If their type does not match, a
suitable error is produced.
(3) Parameters which are not present, are obviously not type-checked. If a
non-passed parameter is used, then a suitable error is produced.
(4) Parameters are passed by value/reference. However, if the value of a
parameter is just a name, the procedure may further evaluate this name or
assign values to it. Data structures or lists can be modified, and if
passed as parameters and modified, will remain modified for the caller.
(5) If additional parameters are passed, then this parameters are not checked
nor are accessible by name. They can be accessed with "args" (all the
parameters) or with "args[i]".
(6) Passing more or less parameters, does not cause an error by itself. Only
when a missing parameter is used it will cause an error. The number of
parameters which are actually passed, is available in the body of the
procedure with the name "nargs".
(7) When defining a class, the parameter names become the field names (and
their types) of the class.
(8) Optional parameters are defined in a slightly different way and are
separated from the rest by a semicolon ";". For a full description of
optional parameters see ?OptionalParameters.
The type following the arrow ("->") is optional and it indicates the type of
result that should be returned. If a type is specified, the procedure will
check that the returned value is of this type. If the type does not match, a
suitable error is produced. This allows to write procedures which are
completely type-safe. If the procedure returns an expression sequence, type
checking is not possible.
To pretty-print an entire procedure, i.e. print all the statements
reformatted according to darwin's standard indentation rules, you should use
print( disassemble(xxx) ) where xxx is a procedure. If xxx is the name of a
procedure, then print( disassemble(op(xxx)) ) should be used.
See also: ?local ?global ?OO ?OptionalParameters
optionaloptional parameters
defaults
The Optional Parameters mechanism allows a flexible, uniform, self-
documenting and efficient way of passing optional parameters to functions,
procedures or constructors. The syntax is as follows, the optional parameters
are separated from the rest of the parameters by a semicolon
SomeProc := proc( parm1:type1, ... ; opt1, opt2, ... ) .... end:
The parameters defined before the semicolon are the regular parameter, and
their behaviour is as usual, except for the fact that when the procedure is
invoked, all the regular parameters must be present. I.e. in the example
below, f has to be called with at least one parameter (a set). The parameters
defined after the semicolon are the optional parameters. Two examples are
given below:
f := proc( a:set ; b:posint, (c=''):string ) ... end:
g := proc( ; 'mode'=(m:string), d:anything ) ... end:
The definition of an optional parameter is as follows (when ambiguous,
"actual parameter" stands for the use of a parameter in a function call,
"formal parameter" stands for the definition of a parameter in the proc
statement):
(1) Each optional parameter definition is a type definition
(2) The definition or exactly one of the subexpressions in each definition
must be a "colon" expression, e.g. b:posint in f, m:string in g. For
type-matching purposes, a colon expression matches the type defined on its
right part. E.g. b:posint matches a posint.
(3) The left part of a colon expression establishes the name of the variable
that will hold the (part of) the parameter. It has two possible formats:
name:type or (name=value):type
(4) The name specified in the left part of a colon expression is the name of a
local variable inside the function/procedure that will hold the value of
what matches on the right part of the colon expression. E.g. f({5},ACGT,
7) will result in the local variable b assigned 7 and the local variable c
assigned ACGT.
(5) If the left part of the colon expression is of the type name=value, then
if no parameters match the optional parameter, the given name will be
assigned "value". This is the preferred mechanism to define default
values for unspecified parameters. E.g. f({0},3) will result in b
assigned 3 and c assigned '', the empty string.
(6) On calling a function/procedure with optional parameters, each actual
optional parameter is paired against the first formal parameter that
matches its type. The actual parameters are matched from left to right.
E.g. g(mode=exact) will assign "exact" to the local variable m, g([1])
will assign [1] to the local variable d.
(7) Once that a formal parameter has been matched with an actual parameter,
its associated name is assigned, and this formal parameter cannot be
paired with any other actual parameter. E.g. f({1},2,3) will assign 2 to
b, and then will give an error, since 3 cannot be matched against any
formal parameter (not yet matched). Notice that the number of actual
parameters cannot be larger than then number of formal parameters when
optional parameters are used.
(8) Once that all the actual parameters are paired, any remaining formal
parameters which are not paired yet and have a colon expression of the
form (name=value):type will have their corresponding local variables
assigned their default values. E.g. f({3}) will leave b unassigned and
assign '' to c.
The following are some worked examples relating to some known functions or
common situations
(I) Align := proc( s1:string, s2:string ; (dm=DM):{Dayhoff,list(Dayhoff)}, ...
)
The function Align always requires two sequences which are strings. Those
will be required on each use and will be s1 and s2. The first optional
argument is a Dayhoff or list of Dayhoff matrices. If none is supplied,
the function will have dm assigned the variable DM (which is normally
assigned to the default Dayhoff matrix).
(II) Align := proc( ..., (Method='Local'):{'Local','Global','CFE','Shake'} ...
The next optional argument defines the method to be used. The method can be
given as a name/string. Only 4 strings are valid methods, and if none
is provided, Method is assigned 'Local'. This also resolves the problem
of incorrectly specifying more than one method, once that the formal
parameter is matched, it cannot be matched again.
(III) SomeClass := proc( ...., Comment:string )
By having an optional parameter at the end of the parameter list, so that it
catches any leftover string is a good way to allow optional
informational data like comments.
(IV) Entry := proc( e:posint ; (db=DB):database )
For system-wide variables, like DB, the default database, which are 99% of
the time used from their default values, this definition provides the
added flexibility that it does not require anything when the default is
used, and if a database is passed as an argument, then it will be used
correctly (without any extra work inside the function).
(V) DrawTree := proc( ..., 'LengthFormat' = ((lf='%d'):{string,procedure})
DrawTree is a function which has many many options, most of which have
practical defaults. In this case, the format for displaying branch
lengths is by default an integer. It could be some other printf format
(which is a string) or some procedure which produces the display. The
internal variable lf is assigned the right information, all error
checking and defaults are done automatically. No disassembling of the
parameter is needed.
Finally here is a more formal definition of the steps followed by the
evaluation of parameters in the presence of optional parameters:
(i) The regular parameters (must all be there) are assigned and type-checked
if they have a type definition.
(ii) Each actual optional parameter (from left to right) is matched against
the unmatched formal parameters (from left to right). An unmatched
actual optional parameter gives an error.
(iii) The unmatched formal parameters that have a default value definition are
evaluated and the corresponding local variables assigned. This
evaluation is done with access to all regular parameters and optional
parameters already assigned.
(iv) For further clarity, the types are never evaluated, they are as given in
the proc statement, only the "value" part of a (name=value):type is
evaluated in full.
locallexical scopelexically scopedaccess rules
temporary variableslexicalscoping
The "local" statement in a procedure body, defines the variables which are
local to the procedure. That is, variables that will only exist for each
invocation of the procedure. The variables will not be assigned any value nor
will retain any value after the procedure end its execution. Recursive
invocations of a procedure will have their own set of local variables,
distinct for every invocation. Normally it is not necessary to define any
local variables, as any variable which is assigned in the body of the
procedure (either explicitly or implicitly in a for-loop) will be made
automatically local. To enforce that an assigned variable be global, it must
be defined in the global statement (see ?global).
Local variables will be accessible to any procedure which is defined inside
the body of the procedure. This is normally called "lexically scoped
variables". The following example clarifies the access rules for variables.
outer := proc( a:numeric )
local x;
x := a+w;
inner := proc( b:numeric )
y := x+b+z
end:
x+inner(7)
end:
The above code defines a procedure called outer. This procedure has one
formal parameter, "a". It also defines a local variable "x"; but this
definition is redundant, as x is assigned inside outer and will automatically
be defined local. "inner" is also a local variable of "outer" as it is
assigned a values inside it. "inner" is a procedure which takes one argument,
"b". "inner" will have a local variable, "y", which is assigned inside its
body. The assignment inside "inner" illustrates all the types of access to
variables: y is local, b is a parameter. Parameters and local have the
highest binding, that is will dominate over other forms of reference. "x" is
external to "inner" but local to "outer" where "inner" is defined to which it
refers. So the "x" in "x+b+z" refers to the local "x" in outer. "x" is
called a lexically scoped variable. It has the second binding strength.
Finally, both "w" and "z" are neither parameters nor locals of any of the
functions and hence are global.
See also: ?global, ?proc, ?OO
abbreviationacronyms
syllableshyphenation
English dictionaries provide the legal hyphenation pattern for a word, eg.
ap . prox . i . mate, usually in bold face. This does not necessarily
correspond to the syllables of the word (these are typically given in the
international pronunciation) e.g. Oxford English Dictionary (OED).
We will use the syllables of a word to create abbreviations for names which
are too long in Darwin. The convention is as follows:
When names are abbreviated in Darwin, we use the first syllable of a word
according to the OED. If this abbreviation is either (1) too short
for uniqueness, (2) unaesthetic or (3) extremely unpronounceable,
the second syllable of the word is added. Subsequent syllables are
added until problems (1) - (3) disappear.
There are small number of computer and biological abbreviations common to
both literatures. These abbreviations do not follow the above principle but
may be used throughout the system and the onus lies on the user's shoulders to
identify their meanings. In general, this list should be kept as small as
possible. There is a delicate balance between the advantages of having short
names in the system and the disadvantages of having too many abbreviations.
Abbreviations from Computer Science:
abbreviation description
DB database
eval evaluate
int Integer
IPC Inter-process communication
LS Least Squares
Svd Singular value decomposition
TSP Travelling salesman problem
UTC Universal time coordinated (Greenwich time)
Abbreviations from Biology are:
abbreviation description
A Amino acid (single letter code)
AAA Amino acid (3-letter code)
AC Accession number, (used by SwissProt database)
Amino Amino acid (fully spelled)
B Base part of nucleotide, (one letter code)
Base Base part of nucleotide, (fully spelled)
BBB Base part of nucleotide, (3-letter code)
CInt an integer between 1 and 64 identifying a codon (3 bases)
Codon 3 bases in a single string, eg. "ACT"
DM Dayhoff matrix
DNA deoxyribonucleic acid, (A,C,G or T)
ID Identification number, (used by SwissProt database)
MSA Multiple Sequence Alignment
NP Nucleotide-peptide
PAM Point accepted mutations, a measure of distance
Pep Peptide (amino acid)
RNA ribonucleic acid, (A,C,G or U)
Sim Similarity score
tRNA transfer-RNA a molecule translating codons to peptides
ipcsend
ipcsend
ipcsend is a simple UNIX program, included in the darwin distribution
package, that sends darwinipc messages directly to the darwinipc daemon. It is
useful for testing the darwinipc daemon. For the complete set of messages that
can be used with ipcsend see the darwinipc help file. (?darwinipc)
Usage is as follows:
ipcsend [timeout] message
timeout (optional) time in seconds to wait for a a response from the daemon.
Default is 3 seconds.
message Message to be sent. If this message contains any characters being
interpreted by the shell or a sequence of blanks, quote it.
Examples:
>CallSystem('ipcsend MSTAT mendel;');
DATA mendel 0:OK BUSY:
>CallSystem('ipcsend PING;');
PING OK
See Also:
ConnectTcp darwinipc DisconnectTcp ParExecuteIPC ReceiveTcp SendTcp
darwinipcIPC
darwinipcIPCinterprocess communication
darwinipc is the interprocess communication program that is distributed with
darwin. It enables two or more darwin processes (on the same or different
machines) to communicate with each other via TCP/IP. The darwinipc daemon
establishes TCP/IP connections between machines and UNIX internal protocol
connections to local processes which want to communicate via the daemon. It
also starts and controls remote jobs. This daemon runs once on each machine.
For TCP/IP communication, darwinipc uses the port number defined by the
Darwin entry in /etc/services. If no such port is defined, it will use the
fixed port 12345. The /etc/services file on all machines you want to use
should therefore contain the following line:
darwin 12345/tcp Darwin DARWIN #Darwin IPC
Whenever a connection to the daemon is established, the correct password
must be sent as first data. The password is read from the file name defined
by the IPC_PW environment variable, or from $HOME/.ipcpw if IPC_PW is not
defined, or from .ipcpw if $HOME is undefined. Make sure this password file
cannot be read by unauthorized users!
As of March 2003, darwinipc, via a system call, uses ssh instead of rsh to
start the darwinipc daemon on remote machines. For this to work properly, ssh
must be configured to run without asking for a password. This can be
accomplished with the .ssh/known_hosts file. Please read the ssh
documentation to configure it properly.
Usage is as follows:
darwinipc [-l] [-L] [-U] [-u user] [-t timeout] port
The command accepts the following options:
port - The UNIX internal protocol port name (/tmp/.darwinipc).
-l - Causes activity log to be written to stdout (default: no
log)
-L - Causes activity log to include data received and sent.
-U - Unbuffered option. The standard output, when redirected
to a file, is not buffered. (useful for debugging.)
-u user - Adds user to the list of users that do not affect login
control (default: all users affect login control)
-t timeout - Sets the time between login and machine load checks to
timeout seconds (default: 10)
The darwinipc daemon understands the following set of commands:
Message: EXIT
Purpose: Exit the daemon.
Replies: Nothing or ERROR message.
Example: EXIT
Message: EXIT
Purpose: Exit the daemon.
Replies: Nothing or ERROR message.
Example: EXIT
Message: JOBS mach
Purpose: Returns the jobs controlled by mach and their status
Replies: DATA mach 0:JOBS {pid RUNNING|STOPPED} or DATA mach 0:ERROR
message or ERROR message
Example: JOBS mendel, returns DATA mendel 0:JOBS 8281 RUNNING
Message: LOADC mach low high
Purpose: Sets load thresholds for mach to low and high (defaults are 0.7
and 2.0)
Replies: nothing or DATA mach 0:ERROR message or ERROR message.
Example: LOADC mendel 0.7 2.0
Message: LOGINC mach ON|OFF
Purpose: Turn login control for mach on or off (turned off by default).
Replies: nothing or DATA mach 0:ERROR message or ERROR message
Example: LOGINC mendel ON
Message: MAXJB mach max
Purpose: Sets maximum number of RUN jobs for mach to max (defaults to 1)
.
Replies: nothing or DATA mach 0:ERROR message or ERROR message
Example: MAXJB mendel 2
Message: MSTAT mach
Purpose: Returns status of mach
Replies: DATA mach 0:OK ALIVE (machine is alive and maximum number of
RUN jobs is not reached) or DATA mach 0:OK BUSY (machine is ali
ve
and further RUN jobs will be rejected or stopped immediately) o
r
DATA mach 0:OK DOWN or DATA mach:0:OK STARTED or ERROR message
Example: MSTAT mendel, returns DATA mendel 0:OK ALIVE
Message: OFFHR mach start end
Purpose: Sets off hours for mach to be from start to end
Replies: nothing or DATA mach 0:ERROR message or ERROR message
Example: OFFHR mendel 8 18
Message: PING
Purpose: Check whether daemon is running on machine from which command i
s
issued
Replies: PING OK or nothing if daemon is not running
Example: PING, returns PING OK
Message: PSTAT mach pid
Purpose: Returns status of process pid on mach
Replies: DATA mach 0:OK STOPPED or DATA mach 0:OK RUNNING or DATA mach 0
:OK
NONE or ERROR message.
Example: PSTAT mendel 8281, returns DATA mendel 0:OK STOPPED
Message: REXIT mach
Purpose: Exit the remote daemon on mach.
Replies: Nothing
Example: REXIT mendel
Message: RSH mach cmd
Purpose: Run command cmd on mach (same as background rsh, but much faste
r.)
cmd is interpreted by csh.
Replies: Nothing or DATA mach 0:ERROR message or ERROR message.
Example: RSH mendel kill -STOP 8281
Message: RUN mach cmd
Purpose: Run controlled command cmd on mach (expects process group and i
d
being sent back by cmd). cmd is interpreted by csh,
Replies: DATA mach 0:OK pid or DATA mach 0:ERROR message or ERROR messag
e.
Example: RUN mendel darwin -q outfile, DATA mendel:OK 8281
Message: SEND mach pid:data
Purpose: Send data packet to PID on mach. mach will receive DATA srcmach
srcpid:data, with srcmach and srcpid being the machine and pid
of
the sending process.
Replies: nothing or DATA mach 0:ERROR message or ERROR message.
Example: SEND mendel 8281:[20.3,14.8] sent by pid 1365 on vinci. PID 828
1
on mendel will receive the message DATA vinci 1365:[20.3,14.8]
See Also:
ConnectTcp DisconnectTcp ipcsend ParExecuteIPC ReceiveTcp SendTcp
namingname conventionsfunction names
name rules
The following is a short document sketching the Darwin naming convention.
We can group the different Darwin constructs into five categories:
built-in types
structured types
commands
built-in functions, and
library functions.
We give a short but reasonably precise set of rules for naming types,
structures, routines etc. for each of these categories. This document is
primarily meant for Darwin developers.
Built-in Types Rules
1) All built-in types should have names consisting of only lower case letters.
2) Only very common computer science names should be abbreviated (See
?abbreviations). For example "uneval" (short for "unevaluated").
Structured Types Rules
1) Only the first letter of each word should be capitalised.
2) Structured type names should be kept reasonably short. When abbreviations
seem appropriate, they should take place according to rule above.
3) Abbreviations used in selector names should correspond to abbreviations
used throughout the system. It makes no sense to use the abbreviation
"Sim" for "Similarity" throughout the Darwin system and then require users
to select with Simil on DayMatrix structures.
4) When the structure is used in tandem with a routine, then the name of the
structure will coincide with the name of the function which constructs such
a structure. (See ?OO for more details)
5) Selector names should reflect the type of data they return. If they return
a simple type, they should have names formatted according to the naming
conventions for simple types. If they return structured types, they should
have names formed according to these rules.
Commands and Built-in function Rules:
1) We follow computer science history for naming conventions as close as
possible.
2) We always use lower case letters. Thus, "error" and "return". Both of
these functions act as commands in Darwin.
3) Mathematical functions are named according to Abramowitz and Stegun
conventions.
4) We stay with the conventions of the language "C" when the function is
sufficiently similar to the "C" routine (e.g. printf, sprintf, sscanf).
5) We stay with the conventions of the language "Maple" when the function is
sufficiently similar to a function from that language (or an exact copy).
6) If the routine has a common application in another field (such as the
"NBody" function does in physics), this name can be chosen. It would be
preferable to give it the more abstract mathematical name when such a name
exists.
7) If none of the above cases apply, we use the conventions of "Library
Functions" below.
Library Function Rule:
1) The name should consist of at most five parts. _<
adjective>
2) The "verb" should reflect the action in a meaningful way ie. Draw, Load,
Save, Print. For performing string searches, the verb should be Search.
If we are aligning sequences, it should be Align. If we are creating a
graphics file, it should be Draw. If a new object is returned or created,
it should be Create.
3) The "noun" will typically be the object if you were to say the sentence
completely. It will typically be a type or structured type. If the
routine works on a particular type, then this type should be placed in the
name as the noun. For example, CreateString. A noun should be chosen
that represents the generic object and is mathematical in nature ie. avoid
choosing names which are cute but little known.
4) The "adjective" should only be included when its absence does not
distinguish between the objective of two or more routines. For example
?DrawTree and DrawUnrootedTree.
5) The first and only the first letter of each word should be capitalised
unless it is part of an abbreviation common in the biology/biochemistry
literature (see ?abbreviations).
6) The "adverb" indicates a qualified action. For example,
ApproxSearchString. (Approximate (abbreviated) is the adverb, Search is
the verb, String is the object type.)
7) The "domain" is a special identifier used to indicate that the routine that
follows (ie. the ) applies to a special
type of object. For example Inter Processor Communication abbreviated to
IPC and Nuclear Peptide abbreviated to NucPep. For example, if we have a
function to align sequences GlobalAlignBestPam, which works for amino
acids, the same function working for nucleotide-peptides will be called
NucPep_GlobalAlignBestPam.
8) Abbreviations should be avoided. When function names are too long, the
adverb and adjective should be the first to be abbreviated. All
abbreviations should follow the rule above.
9) Underscore characters should be avoided except to separate from
the rest of the name. Of course, underscore characters are need for
polymorphism but this poses no problem with our conventions.
10) Nouns should be singular.
11) Nouns in their plural form will be used to define iterators. So Entry
defines a database entry, and Entries() is the iterator which goes over
all the entries of the default database.
12) Functions which perform "conversion" require a bit of extra attention. If
the conversion is from a type to a type, and it is expected to be done
(sometimes) automatically, then the name should be _
(See ?OO for more details on converters). For example, PatEntry_string.
For more general data converted into other data the name should be
To. For example, IntToAmino.
optionoptions
builtininternalnumericpolymorphictracezippable
Procedure declarations allow the definition of options in their headers.
This definition is done with the keyword "option". Options are simple
identifiers, separated by commas and terminated with a semicolon. E.g.
f := proc( x:numeric )
option internal;
x+1
end:
Options can be added to procedures as desired. Programs can test for
options, by selecting the 3rd component of a procedure body. For example, the
expression op(3,op(f)) will return "internal" when f is defined as above.
The system recognizes 7 options which have the following interpretation:
builtin This is interpreted as a function whose definition is in the
kernel. The body of such a procedure should consist of a sigle
integer. This integer is fixed and links directly to an
internal function.
internal This means that the function is not intended for use by general
users. No help files will be generated for them.
numeric Functions which are internal and are mathematical, in the sense
of computing a numerical value, should have this option. A
different, faster, evaluator is used to evaluate them.
polymorphic This means that the given function defined is polymorphic.
When an unknown data structure is passed as an argument, then
the corresponding function will be called. For example:
f := proc( x ) option polymorphic; ....
Later calls like f( ABC(...) ) will attempt to execute ABC_f,
if this name is defined as a procedure. Or f( 1..2 ) will
attempt to execute range_f if this name is defined as a
procedure. See ?polymorphic or ?OO for more details on Object
Oriented programming in Darwin. When a data type/structure is
intended to be also a converter, e.g. Complex, then the
definition of the constructor should also carry option
polymorphic.
trace The corresponding function will have its current printing level
turned high enough so that the execution of all its statements
is printed out.
zippable This means that if the function would not compute when given an
array or a matrix, before issuing an error, an attempt will be
done to compute it as zip(f(x)).
NoIndexing This option is for a class or data structure. It means that
integer indexing will be disallowed both for assigning and for
selecting fields. Ranges of integers are also disallowed.
Setting this option enforces all access to the class through
the names of the fields or through the xxx_select function. If
this option is set and a xxx_select function is defined, then
any integer or range indexing will be passed to the xxx_select
function. Since accessing fields by position usually prevents
polymorphism, this option will help enforce object orientation
NormalizeOnAssign This option is for a class or data structure. It will
force a normalization of the object every time that any of its
components is assigned. This is normally done when there are
many constraints between the components of a class, and it is
not possible to check a field assignment without checking the
entire object.
selectorselectorsselectionfieldsindexing
Selectors are the most common and efficient way to select components of a
structure in Darwin. The syntax for the selectors is the same as the one for
indexing arrays. That is, if s is a structure, s[x] is a selection or
indexing of s. A selector may be used to return a part of an expression or
may be used to modify the corresponding component of the structure. In all
cases, the behaviour of generalized selectors is identical to the behaviour of
array selection (indexing). The selector has several possible forms.
Selectors can be integers, names, strings or arbitrary expressions. If the
selectors are integers, names (also strings) that coincide with the names of
the parameters of the data structures, then they are handled by the kernel.
Otherwise a function xxx_select will be invoked to resolve the selection/
assignment.
posint a positive integer i selects the ith element of the structure,
array, list, set, range, etc. Over strings, selection will return
the ith character.
integer a negative integer i selects the ith element from the right of the
structure, array, list, set, range, etc. Over strings, selection
will return the ith character from the right. That is, a[-1] is
equivalent to (and more efficient than) a[length(a)].
range an expression of the form a..b, where a and b are integers. The
selection will return the values from to b as an expression
sequence. If a or b are negative they are interpreted as counting
from the left. So s[-2..-1] returns the last two components of s.
This form cannot be used in an assignment. When a range selector
is used, an expression sequence is returned. For example, if s is
a structure with at least two elements, then s[1..2] will return an
expression sequence with two elements (suitable for use in lists,
sets and other structures). There is one exception to this rule,
range selection of lists returns a list.
name a name which coincides with a name of a parameter of the data
structure definition, selects that field. If a string is used
instead of a name, it has the same effect. The use of strings is
sometimes needed when the name has been used for a variable and
hence it has a value and cannot be used as a selector.
AC ac
Alignment DayMatrix Identity Length1
Length2 Offset1 Offset2
PamDistance PamNumber PamVariance
Score Seq1 Seq2
Sim modes
Block DigestSeq
CodonMatrix AAPam CodonPam Desc
FixedDel IncDel Sim
Color colcode
Complex DigestSeq
ConsistentGenome name
Counter title value
Covariance CorrMatrix CovMatrix Description
Eigenvalues First MaxVariance
Maximum Mean Minimum
Number StorMatrix StorSum
VarNames Variance
DayMatrix DelCost Dimension FixedDel
IncDel Mapping MaxOffDiag
MaxSim MinSim PamDistance
PamNumber Sim StopSimil
logPAM1 pam type
DocEl content content_i name
tag
Document content_i
Edge From Label Node1
Node2 To
FileStat path st_atime st_blksize
st_blocks st_ctime st_dev
st_gid st_ino st_mode
st_mtime st_nlink st_rdev
st_size st_uid
Gap DigestSeq
Gene AlignErrors Division Exons
Introns NucEntry NucSequence
PepLength PepOffset PepSequence
mRNA
GenomeSummary EntryLengths Epithet FileName
FileNameOrig Genus Id
Kingdom Lineage String
TotAA TotChars TotEntries
Type sgml_tag string
type
Graph Adjacencies AdjacencyMatrix
Degrees Distances Incidences
Labels a n
ID id
Intron div n pam
IntronModel Acceptor Donor InIntron
MinLen
Leaf Height Label
LinearClassification HighestNeg LowestPos NumberNeg
NumberPos WeightNeg WeightPos
WeightedFalses X X0
LinearIntron F I minlen
n pam
LongInteger DigestSeq
Machine Class DownCount ForcedRun
LastProcess LoadRange LoginControl
MaxProcesses Name NiceValue
OffHours Processes StartCycle
User
MAlignment AlignedSeqs InputSeqs labels
t
MapleFormula expr
Match MatchParams
MySqlResult ColumnLabels Data
OrthologousGroup AllAll Length Seqs
Species Tree
Paragraph content content_i indent
PatEntry a
Permutation p
PlotArguments Axis Colors Grid
GridFormat LabelFormat Lines
Title TitlePts TitleX
TitleY
Polar DigestSeq
ProbSeq CharMap ProbVec
Process ElapsedTime EventTime Job
JobTime Pid Stopped
Stat Average CV Description
Excess Max Maximum
Mean MeanVar Min
Minimum Number ShortForm
Skewness StdErr VarVar
VarVariance Variance
SvdResult MinNorm2Err NData Norm2Err
Norm2Indep SensitivityAnalysis SingularValuesDiscarded
SingularValuesUsed SolutionVector
table Table_Default Table_Values key
TaxonomyEntry Children Common Name Lineage
Lineagestring Other names Parent
Scientific Name Species code Synonym
id
TestStatResult CountMatrix TestStat name
plog pstd pvalue
TextBlock blockname blocktype content
content_i
Tree Height Left Right
TreeResult Other Tree Type
UnionFind Clusters Col Elements
ElmInd Sizes The user/programmer can define its own selectors for a particular structure.
See ?selector function for details.
See also: ?expseq, ?op, ?selector function
types
Types
Names which can be used as arguments of the type function and in general, as
arguments when a type is required.
AC Entry MapleFormula Protein
AcceptCriteria equal Match RandomGeneratorPFA
algebraic ETHMachine matrix range
Alignment EvolTree MatrixDecomposition RecombinationNode
AllAllResult expseq MSAMethod relation
AlSumm float MSAStatistics Residue
And Fold MySqlResult RNA
anything Font name SectionHeader
ARG ForLoop ndimPoint select
ARGNode Gap negative SeqThread
array GapHeuristic Nodes set
Assign GapMatch nonnegative Size
Block Gene Not SparsePFA
Bold GenomeSummary NucPepMatch Stat
boolean GramRegion numeric StatSeq
catenate GramSite Or Stop
Center Graph OrthologousGroup string
Chain HelpEntry Pair structure
CoalescentNode History Paragraph SvdResult
Code HyperLink Param symbol
CodonMatrix IfStat Parsimonious Table
Complex Indent PartialOrder table
compressed integer PartialOrderMSA Target
constant IntronModel Partitions TaxonomyEntry
Copyright Island PatEntry TestStatResult
Counter IT Permutation TextBlock
Covariance LastUpdatedBy PlotArguments times
database Leaf plus Tree
DataMatrix LeafNode PlusMin TreeConstruction
DayMatrix less Polar TreeResult
Dependency lessequal posint TreeStatistics
Description LinearClassification positive Triple
DigestionWeights List PostscriptFigure TT
DNA list power type
DocEl Local ProbabilisticFA unequal
Document LongInteger ProbSeq UnionFind
Edge Machine procedure VectorDB
Edges MAlignment Process
Additionally to the above names, the following are also valid types:
type description
--------------------------------------------------------------------------------
matches a numeric with the same value
matches a string/symbol with the same characters
matches a list with the same length and
corresponding types. Ditto for relational operators
( = <> < <= >= > ), ranges, ands, ors, nots,
concatenation and selected names
{typ1,typ2,...} matches if any of the types in the set are matched
identical(xxx) matches xxx exactly
anyfunc(typ1,..,typn) matches any structure which has n arguments and
each argument matches the given subtype
structure(argtype,sname) matches a structure named sname with each argument
matching argtype. sname can be a set of names or
can be absent (any name).
matrix(subtype) matches a matrix of entries matching subtype
array(subtype,dim1,dim2..) matches a multidimensional array of the given
dimensions with entries matching subtype
(subtype) matches the named structured type when all the
components match the subtype
StructName(typ1,...,typn) matches a StructName structure with n arguments and
each argument matches the given subtype
objectorientedprogramming polymorphic
object oriented
C++ OO polymorphism inheritance
Darwin is an object oriented language. Object oriented programming is
supported by several features. To illustrate these notions we will use the
implementation of complex numbers in Darwin. The features supporting OO
programming are:
Data types/Classes - Arbitrary data types can be created dynamically by
using a functional notation, where the function name is the
data type name and the arguments are the components.
Complex( real_part, complex_part ) will be our data type to
hold complex numbers. The number 1 is then represented as
Complex(1,0). Complex(0,1) denotes the imaginary unit. See
?Complex for full details of this example.
Constructors - A constructor of the Data type is any function/method or
operation that will produce as a result a new object of the
given type. It is customary to use the name of the data type
as a constructor. This has several advantages: readability,
simpler name space, and the possibility of having a checker/
normalizer.
When the data type has type restrictions in its components,
this type checking can be done automatically by defining the
contructor function as a function with argument type
checking. For example, if we want our arguments of the
Complex type to be numeric, we can do this by defining:
Complex := proc( Re:numeric, Im:numeric )
. . . .
end:
The result of a call to Complex(a,b) (which is now a function
too) should be the structure Complex(a,b). Technically,
Complex(a,b) must evaluate first, to do parameter checking
and other normalizations, and then return unevaluated. This
is achieved with the noeval() function. Noeval assembles a
data type without calling the function. The above example
becomes:
Complex := proc( Re:numeric, Im:numeric )
noeval( Complex(Re,Im) )
end:
Normalizers - The constructor function/method could perform extra checks or
simplification of the data type if this is desired. In the
case of complex numbers, it may be desirable to simplify
Complex data types with a 0 imaginary part to a simple
numerical value. E.g.
Complex := proc( Re:numeric, Im:numeric )
if Im=0 then Re else noeval( Complex(Re,Im) ) fi
end:
Selectors - Selectors are used in two main modes, to select part of the
data type or to modify part of the data type. Selectors are
handled by the kernel or by user functions. Integer
selectors or selectors with the names of the parameters of
the data type are handled by the kernel. Other selectors are
handled by a function/method named like the data type
concatenated with the string '_select'. The selector
function is passed the object and the selection argument and,
optionally, the value to be assigned. Selectors which are
positive integers or a range of positive integers are
computed directly, and operate on the corresponding component
of the data type.
Complex := proc( Re:numeric, Im:numeric ) ... end:
a := Complex(7,-3);
a[xxx] - Identical to Complex_select(a,xxx)
a[1] - Is 7, without any function calls.
a[Re] - is 7, without any function calls.
a[Im] := -1; - Will change a so that the second
component is changed to -1. This is
done by the kernel.
a[yyy] := 1; - Will be handled by calling Complex_
select(a,yyy,1). The return value is
ignored in this case. Complex_select
will normally modify the data type.
a[2] := 3; - The value 3 is assigned to the second
component of a without any call.
a[1..2] - Is the expression sequence 7,-3,
without any function calls.
a[Im] := []; - Gives an error, since the type of the
second argument does not match the
assigned value.
It is clear that using integer selectors will prevent the use
of generic data types and object orientation, and should not
be encouraged. Note that the function 'op' is equivalent to
selecting with integers.
Using the names of the parameters as the selectors provides
type checking (on assignments) and is performed by the
kernel, hence it is very efficient. If the integrity of the
whole data structure needs to be checked, then the user must
write a xxx_select function to run any desired check and/or
the option NormalizeOnAssign should be specified in the
constructor.
Converters - A converter is a function/method which converts one data type
into another. For data types A and B, the function B_A
should convert a B object into an A object. When the data
type A is defined with option polymorphic, then this
conversion (calling B_A), is done automatically for any use
of A(B(..)). It is common, and very useful, to have
converters to basic types in the system, like string. The
function/method B_A will be called with the object B as
argument. E.g.
Complex_string := proc( C:Complex )
sprintf( '%g+%gi', C[Re], C[Im] ) end:
Polar_Complex := proc( p:Polar )
Complex( p[rho]*cos(p[theta]),
p[rho]*sin(p[theta]) ) end:
Operations - A function/method which is defined with option polymorphic is
able to handle arbitrary objects. If the function is named
f, then when f is called with a single object of type A, A_f
will be called. E.g.
f := proc( x:numeric ) option polymorphic; x+1 end:
Complex_f := proc( x:Complex )
Complex( x[Re]+1, x[Im] ) end:
Most system functions have option polymorphic. In particular
all arithmetic operations. Complex_plus, Complex_times and
Complex_power will handle all arithmetic operations with
Complex data types. (Subtraction and division are handled by
multiplication by -1 and powering to -1). It is very useful
to implement the following methods for a data type A:
A_plus
A_times
A_power
A_print
A_printf
A_string
A_equal
A_Rand
A_type
A_example
A_Description
Type testing - Type testing can be done by a type-testing expression or by a
type-testing procedure. In both cases, the symbol Complex_
type is assigned a value. Type testing expressions are
powerful enough for most uses. E.g.
Complex_type := noeval( Complex(numeric,numeric) );
Notice that a noeval is needed, since the arguments of the
Complex type are not valid as a complex number, and hence
would give an error if evaluated. With this definition type
testing can be done explicitly or implicitly. E.g.
if type(a,Complex) then ....
Complex_plus := proc( a:Complex, b:Complex ) ...
Inheritance - Inheritance is the ability of instructing the system that a
certain data type is equivalent or a super-set of another,
and hence operations do not need to be redefined. More
precisely, let A and B be data types and assume that A is
either equivalent or a super-set of B. The command
Inherit( A, B );
is interpreted as: A will inherit any operation from B which
is not defined for A. This operation will be appropriately
modified so that it works with A objects. For example, we
can define the data type Polar, which is also a complex
number. So Polar is equivalent to Complex. Polar will have
some special selectors, and some operations which can be
performed more efficiently in this representation (e.g.
multiplication, powering and absolute value). The rest of
the operations can be inherited from Complex. The definition
of Polar could be:
Polar := proc( rho:numeric, theta:numeric )
option polymorphic;
... normalizations, error checking, etc. ...
noeval( Polar(args) ) end:
Polar_select := proc( a:Polar, s, val )
# selectors must include Re and Im so that it
# can work as a Complex
. . . . end:
Polar_times := proc( a:Polar, b:Polar )
Polar( a[rho]*b[rho], a[theta]+b[theta] ) end:
Polar_abs := proc( p:Polar ) p[rho] end:
Inherit( Polar, Complex );
CompleteClass( Polar );
The function CompleteClass performs checking and some level
of completion of a class. For example, it will find that
there is no type definition for Complex, but there is enough
information (from the types of the parameters) to construct a
primitive checker. In this example it will also create a
Complex_Rand function which creates random instances of
Complex. It is recommended that CompleteClass is run after a
class is defined.
Organization - All functions/methods related to a data type, i.e. all
functions with names Complex_xxx, should be stored in the
library in a single file, ideally named "Complex". The
symbol Complex should be assigned an unevaluated ReadLibrary
command. E.g.
Complex := noeval( ReadLibrary(Complex) ):
or if the functions are stored in 'mylibrary/Complex',
Complex := noeval( ReadLibrary( 'mylibrary/Complex',
Complex )):
See also: ?option ?Complex ?type ?Inherit ?selectors ?ReadLibrary
index
Index of topics available under this help system. Type ?xxxx
in a single line to obtain the help on xxxx
AAAP AAAToInt AaCount AaFreqNoPat AaFrequency
abbreviation abs AC ACS ActOut
AddDeviation AddGF AddSpecies Align AlignedIntrons
AlignedPeptide AlignedSeq AlignGaps Alignment AlignNucPepAll
AlignNucPepMatc AlignOneAll AllIndices AllMatches AllRootedTrees
AllTernaryRoots AltGenCode amino acids AminoP AminoToInt
antiparallel AP APC append AppendFile
appendto ApproxSearchStr ApprTextSearch arcsin arctan
AsciiToInt assemble assert assign assigned
AToCInt AToCodon AToGenCode atoi AToInt
AtomicVolume avg BackDynProgr BackTranscribe BackTranslate
BaseCount BaseP bases BaseToInt BBBC
BBBP BBBToInt BestPamShake BestSearchStrin BestStringMatch
Beta_Rand BFGSMinimize Binomial_Rand BinTree BipartiteGraph
BipartiteSquare BirthDeathTree BisectTree Block BootstrapTree
BP BrightenColor BToInt CalculateScore CallSystem
CaseSearchStrin CBBB ceil CenterTreeRoot ChangeLeafLabel
CheckAmbigTree ChiSquare_Rand Cholesky ChouFasman CIntToA
CIntToAAA CIntToAmino CIntToCodon CIntToInt CircularTour
CleanMSA clearw Clique CloseGF Clustal
ClustalMSA ClusterRelPam Clusters Code CodeToNuc
CodonAlign CodonCode CodonCount CodonDynProgStr CodonMatrix
CodonMutate CodonPamToPam CodonToA CodonToAAA CodonToAmino
CodonToCInt CodonToInt CodonUsage coeff Collapse
CollapseNodes CollectStat Color ColorMap ColorPalette
ColorTree command commandline Complement ComplementSeque
Complex compress CompressGF ComputeCAI ComputeCAIVecto
ComputeCubicTSP ComputeDimensio ComputeQuarticT ComputeTPI ComputeTSP
ConcatStrings ConnectTcp ConsistentGenom conversion ConvertSP
ConvertToDF convolve copy cor cos
Counter Covariance CreateArray CreateCodonMatr CreateCodonMode
CreateDayMatric CreateDayMatrix CreateGF CreateMSAMethod CreateOrigDayMa
CreateParametri CreateRandMultA CreateRandPermu CreateRandSeq CreateString
CreateSynMatric CreateTreeConst CreateTreeConst CreateTreeStati CT_Species
Cumulative CumulativeStd CurrentOff currentOfs darwin
darwinipc data structures database DataMatrix date
Dayhoff DayhoffM DayMatrix DayMatrixScale DBL_EPSILON
DBL_MAX DbToDarwin debug decompress defaults
DelFixed DelIncr Denormalize Description DF
DigestAspN digester DigestionWeight digestor DigestSeq
DigestSeqs DigestTrypsin DigestWeights disassemble DisconMinimize
DisconnectTcp Distribution DM DMDMS DMS
DnaFile DNAPepDayhoffM DocEl Document DoGapHeuristic
DotPlot DownloadURL dprint dpuTime DrawDistributio
DrawDotplot DrawGraph DrawHistogram DrawPlot DrawPointDistri
DrawSplitGraph DrawSplits DrawStackedBar DrawTree dSplitGraph
dSplitIndex dSplitMetricSum dSplits DynProgGap DynProgMass
DynProgMassDb DynProgNucPepSt DynProgr DynProgScore DynProgStrings
Edge EdgeComplement Edges Eigenvalues eigenvalues
EndOverlayPlot EnterProfile Entries Entropy Entry
EntryInfo EntryNumber enum enzyme enzymes
EOF EqradTree erf erfc erfcinv
ERROR error EstimateCodonPA EstimateNG86 EstimatePam
EstimatePB93 EstimateSynPAM eval evalb EvolTree
exit ExitProfile exp ExpandFileName ExpFit
ExpFit2 Exponential_Ran ExponFit ExponFit2 expx1
ExtCallFrame ExtendClass externcall factorial Fauchere
FDist_Rand fields FileStat FindCircularOrd FindConnectedCo
FindEntropy FindHighlyExpre findkey FindLongestRep FindNucPepPam
FindRules FindSpeciesViol floor FlushGF FragSearch
FreeEnergy function Gamma GammaDist_Rand Gap
GapHeuristic GapMatch GapTree GaussElim gausselim
gc gcd gcm GenCode GenCodeToInt
Gene genetic code GenomeSummary Geometric_Rand GetAaCount
GetAaFrequency GetAllNucPepMat GetBetween GetComplement GetEntryInfo
GetEntryNumber GetFileInfo GetGramRegionSc GetGramSiteScor GetIndex
GetIntrons GetLabels GetLcaSubtree GetMachineUsage GetMATreeNew
GetMolWeight GetMostFrequent GetOffset GetPam GetPartitions
GetPath GetPathDistance GetPeptides getpid GetPosition
GetSubTree_r GetTreeLabels GetTreeLength gigahertz GivensElim
GlobalNucPepAli Globals GOdefinition GOdownload GOname
GOnumber GOsubclass GOsubclassR GOsuperclass GOsuperclassR
GramRegion GramSchmidt GramSite Graph Graph_minus
Graph_Rand Graph_XGMML HammingSearchAl HammingSearchSt has
hash hastype help Histogram History
hostname HTMLColor HTMLColorprint HTMLCols HTMLprint
HTMLRows HTMLTitle hydrophobicity i/o ID
IdenticalTrees Identity IDS If ilogb
indets indexing InduceGraph Infix InfixNr
Inherit input input/output Interior InteriorTot
intersect IntOut IntraDistance Intron IntronModel
IntToA IntToAAA IntToAmino IntToAscii IntToB
IntToBase IntToBBB IntToCInt IntToCodon IntToGenCode
invlogit io IPCconnect IPCdisconnect IPCread
IPCreceive IPCreceiveDATA IPCsend ipcsend IPCsendDATA
iquo islower IsolationIndex isupper iterate
json kGramRegion kGramRegionScor kGramSite kGramSiteScore
KHTest KWIndex LabelTree LarsonTree lasterror
latex lcoeff Leaf LeastSquaresTre Leaves
length lg libname LinearClassific LinearClassify
LinearIntron LinearProgrammi LinearRegressio Lines LinRegr
List list ln ln1x LnGamma
Lngamma lnProbBallsBoxe load LoadFile LoadMatrixFile
local LocalNucPepAlig LocalNucPepAlig LockFile log
log10 logit LongestRep LongInteger lowercase
lprint LSBestDelete LSBestSum LSBestSumDelete Machine
MachineUsage MafftMSA MAlign MAlignment MapGF
MapleFormula MassDyn MassDynAll MassProfile Match
MatchRegex Matrices matrix matrix_inverse max
MaxCut MaxEdgeWeightCl Maximize MaximizeFunc MaximizeRD
MaxLikelihoodSi MaxSimil median member min
MinCut Minimize Minimize2D Minimize2DFunc MinimizeBrent
MinimizeFunc MinimizeSD Minimizex MinSimil MinSqTree
MinSquareTree minus MLTopoTest mod MolWeight
MostFrequent MoveGap MSAMethod MSAStatistics mselect
MST MultAlign MultiAlign Multinomial_Ran MultipleSubTree
Mutate MySql MySqlResult name convention names
naming NBody NDF NewArray NewString
NextGF Nodes noeval Normalize Normal_Rand
NPAlignMatch NPAllMatches NPBackDynProgr NPBestPamMatch NPBestPamShake
NPDynProgr NPMatch NPMultiAllMatch NPOneAllMatch NPRefine
NPRefineShake NPRegions NPSprintMatch NSubGene NToInt
NucDB NucleicToInt nucleotides NucPepBackDynPr NucPepDynProg
NucPepMatch NucPepRegions NucToCode NucToInt NULL
numeric Offsets OneAllMatch op OpenAppending
OpenGF OpenPipe OpenReading OpenWriting option
optional optional parame options OrderedSearch OrthologousGrou
Orthologues output OutsideBounds PA PAAA
PAmino PamMax PamToCodonPam PamToPerIdent PamtoPI
PamWindows Paragraph ParallelAllNucP parameters ParExec
ParExec2 ParExecuteIPC ParExecuteTest parse ParseDimacsGrap
ParseNewickTree ParsePred ParTest PartialFraction Partitions
Partitions_GetC Partitions_GetT Partitions_Reso PASfromMSA PASfromTree
PatEntries PatEntry Path PB PBase
PBBB PDF PepDB PepPepSearch peptides
PerIdentToPam Permutation PhylogeneticTre PhyloTree PhyML
Pi PickTree PItoPam plot Plot2Gif
PlotArguments PlotIndex PlotOptions PlotPam Poisson_Rand
Polar PolishAngles PosInfo PositionDF PositionTree
Postfix PostscriptFigur Predict PredictGenes Prefix
Primes print printf PrintIndex PrintInfo
printlevel PrintMatrix prints PrintSeqsInTree PrintStringMatc
PrintTreeSeq ProbAncestor ProbBallsBoxes ProbCloseMatche ProbDynProg
ProbDynProgr ProbIndex ProbSeq proc procedure
Process product profile ProfileEnter ProfileExit
profiling Protect Protein PruneTree PSDynProg
PSubGene QueryAll QueryGF Rand RandomPermut
RandomSeq RandomTrees RandTree Rank RAxML
RBFS_Tree read Readability ReadBrk readBRK
ReadData ReadDb ReadDssp readDSSP ReadFasta
readfile ReadLibrary ReadLine ReadOffsetLine ReadPhylip
readpipelines ReadProgram ReadRawFile ReadRawLine readstat
readstatAt ReadTable ReadTcp ReadURL ReceiveDataTcp
ReceiveTcp ReconcileTree RedoCompletion Refine RefineLog
RefineShake regexp Region RegularGraph RelativeAdaptiv
RellTree remember RenderTemplate ReplaceString RETURN
return returntype Reverse RGB_string RobinsonFoulds
Roman Romberg RotateTree round RSCU
RunDarwinSessio SameTree SaveEntries scalb Scale
ScaleIndex ScaleTree ScoreAlignment ScoreIntron Scramble
SearchAC SearchAllArray SearchAllString SearchArray SearchDayMatrix
SearchDb SearchDelim SearchDF SearchFrag SearchID
SearchMassDb SearchMultipleS SearchOrderedAr SearchPepAll SearchPepDF
SearchSeqDb SearchString SearchTag SearchText searchtext
selection selector selectorfunctio SendDataTcp SendTcp
seq sequal Sequence Sequences ServerSocket
Set set SetRand SetRandSeed SetupRA
SetuptRNA sha2 ShortestPath ShortestPath2 Shuffle
sign Signature SignedSynteny Simil sin
size sleep SmallAllAll Smooth sort
SortedMA SPCommonName specfunc SpeciesCode Species_Entry
SplatTree SplitLines sprintf SprintMatch SpToDarwin
SP_Species sqrt srand sscanf Ssystem
StackedBar Stat Stats StatTest std
Std_Score string Strings string_RGB structures
Student_Rand SubDist subs subset SubTree
sum SummarizeTree Surface SurfaceTot SurfIntActPred
SurfOut SvdAnalysis SvdBestBasis SvdResult symbol
Synteny system SystemCommand Table table
tan TaxonId TaxonomyDownloa TaxonomyEntry TempName
TestGradHessian TestStatResult TetrahedronGrap Text text
TextBlock TextHead time TimedCallSystem TotalAlign
TotalTreeWeight TPIDistr Transcribe Translate translation
transpose traperror Tree TreeAngles TreeConstructio
TreeOrder TreeResult TreeSize TreeStatistics TreeToPam
Tree_Graph Tree_matrix trim TrulyRandom trunc
TSP TSP3 TSP4 TT type
UnassignGlobals UnCompressGF union UnionFind UnionStats
UnLabelTree update UpdateSpeciesCo UpdateStat uppercase
UTCTime UUUP var version VertexCover
View ViewPlot Violations VisualizeProtei warning
WeightObservati WriteBlock WriteData WriteFasta WriteFile
WriteMSA WriteSeqXML writeto Zeta zip
Zscore