org.sbml.libsbml
.
LibSBML uses Abstract Syntax
Trees (ASTs) to provide a canonical, in-memory representation for all
mathematical formulas regardless of their original format (i.e., C-like
infix strings or MathML). In
libSBML, an AST is a collection of one or more objects of class ASTNode
. An AST
node in libSBML is a recursive structure containing a pointer to
the node's value (which might be, for example, a number or a symbol) and a
list of children nodes. Each ASTNode
node may have none,
one, two, or more child depending on its type. The following diagram
illustrates an example of how the mathematical expression "1 + 2" is
represented as an AST with one plus node having two
integer children nodes for the numbers 1 and 2. The figure also shows
the corresponding MathML 2.0
representation:
![]() |
The following are noteworthy about the AST representation in libSBML:
double
data type. This is
done so that when an SBML model is read in and then written out again, the
amount of change introduced by libSBML to the SBML during the round-trip
activity is minimized.
getNumerator()
and getDenominator()
methods provided by the ASTNode
class.
ASTNode
object are other ASTNode
objects. The list of children is empty for nodes that are leaf elements,
such as numbers. For nodes that are actually roots of expression subtrees,
the list of children points to the parsed objects that make up the rest of
the expression.
For many applications, the details of ASTs are irrelevant because the
applications can use libSBML's text-string based translation functions such
as libsbml.formulaToString(ASTNode)
and libsbml.parseFormula(java.lang.String)
.
If you find the complexity of using the AST representation of expressions
too high for your purposes, perhaps the string-based functions will be more
suitable.
Finally, it is worth noting that the AST and MathML handling code in libSBML remains written in C, not C++, as all of libSBML was originally written in C. Readers may occasionally wonder why some aspects are more C-like than following a C++ style, and that's the reason.
SBML Level 2 represents mathematical expressions using MathML 2.0 (more specifically, a subset of the content portion of MathML 2.0), but most software applications using libSBML do not use MathML directly. Instead, applications generally either interact with mathematics in text-string form, or else they use the API for working with Abstract Syntax Trees (described below). LibSBML provides support for both approaches. The libSBML formula parser has been carefully engineered so that transformations from MathML to infix string notation and back is possible with a minimum of disruption to the structure of the mathematical expression.
The example below shows a simple program that, when run, takes a MathML string compiled into the program, converts it to an AST, converts that to an infix representation of the formula, compares it to the expected form of that formula, and finally translates that formula back to MathML and displays it. The output displayed on the terminal should have the same structure as the MathML it started with. The program is a simple example of using the various MathML and AST reading and writing methods, and shows that libSBML preserves the ordering and structure of the mathematical expressions.
import org.sbml.libsbml.ASTNode; import org.sbml.libsbml.libsbml; public class example { public static void main (String[] args) { String expected = "1 + f(x)"; String input_mathml = "<?xml version='1.0' encoding='UTF-8'?>" + "<math xmlns='http://www.w3.org/1998/Math/MathML'>" + " <apply> <plus/> <cn> 1 </cn>" + " <apply> <ci> f </ci> <ci> x </ci> </apply>" + " </apply>" + "</math>"; ASTNode ast_result = libsbml.readMathMLFromString(input_mathml); String ast_as_string = libsbml.formulaToString(ast_result); if (ast_as_string.equals(expected)) { System.out.println("Got expected result."); } else { System.out.println("Mismatch after readMathMLFromString()."); System.exit(1); } ASTNode new_mathml = libsbml.parseFormula(ast_as_string); String new_string = libsbml.writeMathMLToString(new_mathml); System.out.println("Result of writing AST to string:"); System.out.print(new_string); System.out.println(); } static { try { System.loadLibrary("sbmlj"); } catch (Exception e) { System.err.println("Could not load libSBML library:" + e.getMessage()); } } }
The text-string form of mathematical formulas produced by
libsbml.formulaToString(ASTNode)
and read by libsbml.parseFormula(java.lang.String)
are simple C-inspired infix notation taken from SBML Level 1. It is
summarized in the next section below. A formula in this text-string form
therefore can be handed to a program that understands SBML Level 1
mathematical expressions, or used as part of a translation system. In
summary, the functions available are the following:
static java.lang.String libsbml.formulaToString(ASTNode)
reads an AST, converts it to a text string in SBML
Level 1 formula syntax, and returns it. The caller owns the character
string returned and should free it after it is no longer needed.
static ASTNode
libsbml.parseFormula(java.lang.String)
reads a text-string containing a mathematical
expression in SBML Level 1 syntax, and returns an AST corresponding to
the expression.
The text-string formula syntax is an infix notation essentially derived from the syntax of the C programming language and was originally used in SBML Level 1. The formula strings may contain operators, function calls, symbols, and white space characters. The allowable white space characters are tab and space. The following are illustrative examples of formulas expressed in the syntax:
0.10 * k4^2
(vm * s1)/(km + s1)
The following table shows the precedence rules in this syntax. In the
Class column, operand implies the construct is an operand,
prefix implies the operation is applied to the following
arguments, unary implies there is one argument, and
binary implies there are two arguments. The values in the
Precedence column show how the order of different types of operation are
determined. For example, the expression a * b + c is evaluated as
(a * b) + c because the *
operator has higher
precedence. The Associates column shows how the order of similar
precedence operations is determined; for example, a - b + c is
evaluated as (a - b) + c because the +
and
-
operators are left-associative. The precedence and
associativity rules are taken from the C programming language, except for
the symbol ^
, which is used in C for a different purpose.
(Exponentiation can be invoked using either ^
or the function
power
.)
Token | Operation | Class | Precedence | Associates |
---|---|---|---|---|
name | symbol reference | operand | 6 | n/a |
( expression) | expression grouping | operand | 6 | n/a |
f( ...) | function call | prefix | 6 | left |
- | negation | unary | 5 | right |
^ | power | binary | 4 | left |
* | multiplication | binary | 3 | left |
/ | divison | binary | 3 | left |
+ | addition | binary | 2 | left |
- | subtraction | binary | 2 | left |
, | argument delimiter | binary | 1 | left |
A program parsing a formula in an SBML model should assume that names
appearing in the formula are the identifiers of
Species
,
Compartment
,
Parameter
,
FunctionDefinition
,
or Reaction
defined in a model. When a function call is involved, the syntax consists
of a function identifier, followed by optional white space, followed by an
opening parenthesis, followed by a sequence of zero or more arguments
separated by commas (with each comma optionally preceded and/or followed by
zero or more white space characters), followed by a closing parenthesis.
There is an almost one-to-one mapping between the list of predefined
functions available, and those defined in MathML. All of the MathML
funcctions are recognized; this set is larger than the functions defined in
SBML Level 1. In the subset of functions that overlap between MathML
and SBML Level 1, there exist a few differences. The following table
summarizes the differences between the predefined functions in SBML
Level 1 and the MathML equivalents in SBML Level 2:
Text string formula functions | MathML equivalents in SBML Level 2 |
---|---|
acos | arccos |
asin | arcsin |
atan | arctan |
ceil | ceiling |
log | ln |
log10(x) | log(10, x) |
pow(x, y) | power(x, y) |
sqr(x) | power(x, 2) |
sqrt(x) | root(2, x) |
Every ASTNode
in a libSBML abstract syntax tree has an associated type, which is a value
taken from a set of constants having names beginning with AST_
and defined in org.sbml.libsbml.libsbmlConstants
.
The list of possible AST types in libSBML is quite long, because it covers
all the mathematical functions that are permitted in SBML. The values are
shown in the following table; their names hopefully evoke the construct
that they represent:
|
|
|
AST_UNKNOWN | AST_FUNCTION_ARCCOTH | AST_FUNCTION_POWER |
AST_PLUS | AST_FUNCTION_ARCCSC | AST_FUNCTION_ROOT |
AST_MINUS | AST_FUNCTION_ARCCSCH | AST_FUNCTION_SEC |
AST_TIMES | AST_FUNCTION_ARCSEC | AST_FUNCTION_SECH |
AST_DIVIDE | AST_FUNCTION_ARCSECH | AST_FUNCTION_SIN |
AST_POWER | AST_FUNCTION_ARCSIN | AST_FUNCTION_SINH |
AST_INTEGER | AST_FUNCTION_ARCSINH | AST_FUNCTION_TAN |
AST_REAL | AST_FUNCTION_ARCTAN | AST_FUNCTION_TANH |
AST_REAL_E | AST_FUNCTION_ARCTANH | AST_LOGICAL_AND |
AST_RATIONAL | AST_FUNCTION_CEILING | AST_LOGICAL_NOT |
AST_NAME | AST_FUNCTION_COS | AST_LOGICAL_OR |
AST_NAME_TIME | AST_FUNCTION_COSH | AST_LOGICAL_XOR |
AST_CONSTANT_E | AST_FUNCTION_COT | AST_RELATIONAL_EQ |
AST_CONSTANT_FALSE | AST_FUNCTION_COTH | AST_RELATIONAL_GEQ |
AST_CONSTANT_PI | AST_FUNCTION_CSC | AST_RELATIONAL_GT |
AST_CONSTANT_TRUE | AST_FUNCTION_CSCH | AST_RELATIONAL_LEQ |
AST_LAMBDA | AST_FUNCTION_EXP | AST_RELATIONAL_LT |
AST_FUNCTION | AST_FUNCTION_FACTORIAL | AST_RELATIONAL_NEQ |
AST_FUNCTION_ABS | AST_FUNCTION_FLOOR |
|
AST_FUNCTION_ARCCOS | AST_FUNCTION_LN | |
AST_FUNCTION_ARCCOSH | AST_FUNCTION_LOG | |
AST_FUNCTION_ARCCOT | AST_FUNCTION_PIECEWISE |
There are a number of methods for interrogating the type of an ASTNode
and for testing
whether a node belongs to a general category of constructs. The methods
defined by the ASTNode
class are the
following:
int getType()
returns the type of
this AST node.
boolean isConstant()
returns true
if this AST node is a MathML constant
(true
, false
, pi
,
exponentiale
), false
otherwise.
boolean isBoolean()
returns true
if this AST node returns a boolean value (by
being either a logical operator, a relational operator, or the constant
true
or false
).
boolean isFunction()
returns true
if this AST node is a function (i.e., a MathML
defined function such as exp
or else a function defined by a
FunctionDefinition in the Model).
boolean isInfinity()
returns true
if this AST node is the special IEEE 754 value
infinity.
boolean isInteger()
returns true
if this AST node is holding an integer value.
boolean isNumber()
returns true
if this AST node is holding any number.
boolean isLambda()
returns true
if this AST node is a MathML lambda
construct.
boolean isLog10()
returns
true
if this AST node represents the log10
function, specifically, that its type code is AST_FUNCTION_LOG
and it has two children, the first of which is an integer equal to 10.
boolean isLogical()
returns true
if this AST node is a logical operator
(and
, or
, not
, xor
).
boolean isName()
returns
true
if this AST node is a user-defined name or (in SBML Level
2) one of the two special csymbol
constructs "delay" or
"time".
boolean isNaN()
returns
true
if this AST node has the special IEEE 754 value "not a
number" (NaN).
boolean isNegInfinity()
returns true
if this AST node has the special IEEE 754 value
of negative infinity.
boolean isOperator()
returns true
if this AST node is an operator (e.g.,
+
, -
, etc.)
boolean isPiecewise()
returns true
if this AST node is the MathML
piecewise
function.
boolean isRational()
returns true
if this AST node is a rational number having a
numerator and a denominator.
boolean isReal()
returns
true
if this AST node is a real number (specifically, AST_REAL_E
or AST_RATIONAL
).
boolean isRelational()
returns true
if this AST node is a relational operator.
boolean isSqrt()
returns
true
if this AST node is the square-root operator
boolean isUMinus()
returns true
if this AST node is a unary minus.
boolean isUnknown()
returns true
if this AST node's type code is AST_UNKNOWN
.
Programs manipulating AST node structures should check the type of a given node before calling methods that return a value from the node. The following meethods are available for returning values from nodes:
int getInteger()
char getCharacter()
java.lang.String getName()
int getNumerator()
int getDenominator()
double getReal()
double getMantissa()
int getExponent()
Of course, all of this would be of little use if libSBML didn't also provide methods for @em setting the values of AST node objects! And it does. The methods are the following:
void setCharacter(char
value)
sets the value of this AST node to the given character. If
character is one of +
, -
, void *
,
/
or ^
, the node type will be to the appropriate
operator type. For all other characters, the node type will be set to the
type code AST_UNKNOWN
.
void setName(java.lang.String)
sets the value of this AST node to the given name. The node type will be
set (to AST_NAME
)
only if the AST node was previously an operator
(isOperator(node) != 0
) or number (isNumber(node) !=
0
). This allows names to be set for AST_FUNCTION
nodes and the like.
void setValue(int
value)
sets the value of the node to the given integer,
value
.
void setValue(int
numerator, int denominator)
sets the value of this AST node to the
given rational in two parts: the numerator
and
denominator
. The node type code is set to AST_RATIONAL
.
void setValue(double
value)
sets the value of this AST node to the given
double
value
and sets the node type code to AST_REAL
.
void setValue(double mantissa, int exponent)
sets the value of
this AST node to the given real number in two parts: the
mantissa
and the exponent
. The node type code is
set to AST_REAL_E
.
Finally, ASTNode
also defines some
miscellaneous methods for manipulating
ASTNode ASTNode(int
type)
creates a new AST node object and returns a pointer to it.
The returned node will have the given type
code. The type
value must be chosen from among the constants beginning with the characters
AST_
defined in org.sbml.libsbml.libsbmlConstants
.
ASTNode ASTNode()
creates
a new AST node object and returns a pointer to it. The returned node will
have the type AST_UNKNOWN
.
Its type should be set by the caller as soon as possible using
long getNumChildren()
returns the number of children of this AST node or 0 is this node has no
children.
addChild(ASTNode)
adds the given node as a child of this AST node. Child nodes are added in
left-to-right order.
prependChild()
adds the given node as a child of this AST node. This method adds child
nodes in right-to-left order.
ASTNode getChild(long
n)
returns the n
th child of this AST node or
NULL
if this node has no n
th child (n >
(ASTNode.getNumChildren() - 1)
).
ASTNode getLeftChild()
returns the left child of this AST node. This is equivalent to
ASTNode.getChild(0)
.
ASTNode getRightChild()
returns the right child of this AST node or NULL
if this node
has no right child.
swapChildren(ASTNode that)
swaps the
children of this AST node with the children of that
AST node.
setType(int
type)
sets the type of this AST node to the given type
code. The value must be chosen from among the constants beginning with the
characters AST_
defined in org.sbml.libsbml.libsbmlConstants
.
As mentioned above, applications often can avoid working with raw MathML by using either libSBML's text-string interface or the AST API. However, when needed, reading MathML content directly and creating ASTs, as well as the converse task of writing MathML, is easily done using two methods designed for this purpose:
ASTNode readMathMLFromString(java.lang.String)
reads raw MathML from a text string, constructs an AST from it, then
returns the root AST node of the resulting expression tree.
java.lang.String writeMathMLToString(ASTNode)
writes an AST to a
string. The caller owns the character string returned and should free it
after it is no longer needed.
The example program given above demonstrate the use of these methods.