Skip to content

Latest commit

 

History

History
423 lines (295 loc) · 19.1 KB

reference.md

File metadata and controls

423 lines (295 loc) · 19.1 KB

Erlsom Reference

Function Index

compile_xsd(XSD) -> {ok, Model}

Equivalent to compile_xsd(XSD, []).

compile_xsd(XSD, Options) -> {ok, Model}

XSD     = [int()]
Options = [Option]
Option  = {prefix, Prefix} |
          {type_prefix, TypePrefix} |
          {group_prefix, GroupPrefix} |
          {include_fun, Include_fun} |
          {include_dirs, Include_dirs} |
          {include_files, Include_files} |
          {strict, boolean()} |
          {include_any_attribs, boolean()}

Model   = the internal representation of the XSD

Compiles an XSD into a structure to be used by erlsom:scan() and erlsom:write(). Returns {ok, Model} or {error, Error}.

XSD can be an encoded binary (see section on character encoding) or a decoded list of Unicode code points.

  • Prefix is prefixed to the record names in the XSD. It should be a string or 'undefined'. If it is 'undefined', no prefix will be applied. The default is 'undefined' (no prefix). The prefix specified with this option is applied to the records that correspond to types from the target namespace of the specified XSD. Different prefixes can be specified for XSDs that are imported, see the other options below.

    Note that Erlsom:write() uses the prefixes to assign the namespaces. As a consequence, you should use prefixes in case your XML documents use elements from more than one namespace (or if they contain a mixture of elements that are namespace qualified and elements that are not).

  • TypePrefix is prefixed to the record names that correspond to type definitions in the XSD. It should be a string.

    Record definitions are created for elements, groups and types. In the XSD there may be groups, elements and types with the same name; this would lead to more than one record with the same name. In order to avoid the problems that this would create, it is possible to specify a prefix that will be put in between the namespace prefix (see above) and the name of the type.

  • GroupPrefix is prefixed to the record names that correspond to group definitions in the XSD. It should be a string. See the explanation provided above for the TypePrefix option for the background of this option.

  • Include_fun is a function that finds the files that are included or imported in the XSD. It should be a function that takes 4 arguments:

    • Namespace (from the XSD). This is a string or 'undefined'
    • SchemaLocation (from the XSD). This is a string or 'undefined'
    • Include_files. This is the value of the ‘include_files’ option if this option was passed to compile_xsd(); [] otherwise.
    • Include_dirs. This is the value of the ‘include_dirs’ option if this option was passed to compile_xsd(); 'undefined' otherwise.

    Include_fun should return {XSD, Prefix}, where XSD is a XSD = string(), Prefix = string or 'undefined' - if the value is undefined, ‘P’ will be used.

    Include_fun defaults to a function that uses the Include_dirs and Include_list options as specified below.

    • Include_files is a list of tuples {Namespace, Prefix, Location}. Default is [].

    • Include_dirs is a list of directories (strings). It defaults to ["."].

    Behavior for include and import:

    If 'include_fun' option was specified, this function will be called. This should return both the contents of the file as a string and the prefix (a tuple {Xsd, Prefix}).

    Otherwise, if the 'includes_files' option is present, the list provided with this option will be searched for a matching namespace. If this is found, the specified prefix will be used. If a file is also specified, then this file will be used. If no file is specified (value is undefined), then the 'location' attribute and the 'include_dirs' option will be used to locate the file.

    If the 'include_files' option is not present, or if the namespace is not found, then the file will be searched for in the include_dirs (based on the 'location' attribute). No prefix will be used.

  • If strict == false (this is the default), then only the XML Schema data types integer, int, boolean and qname are mapped to the corresponding Erlang data type. All other data types are mapped to string().

    If strict == true, also the XML Schema data types float and double as well as all data types derived from integer (nonPositiveInteger, negativeInteger, long, short, byte, nonNegativeInteger, unsignedLong, unsignedInt, unsignedShort, unsignedByte, positiveInteger) are mapped to erlang float() (for float and double) or integer(). For the integer types it will also be checked whether they are within the range specified for the XML Schema type.

  • If include_any_attribs == true (this is the default), then the second element of each of the records that are created by erlsom:scan(Xml, Model) will be a list that contains any attributes in the corresponding element of XML that are not explicitly specified by the XSD. If include_any_attribs == false, such an extra element will not be present in the result of erlsom:scan(Xml, Model). If Xml contains attributes that were not explicitly declared in the XSD they will be simply ignored, and they will not be visible in the output.

Equivalent to compile_xsd_file(XSD, []).

compile_xsd_file(XSD, Options) -> {ok, Model}

As compile_xsd(), but taking its input from a file.

add_xsd_file(FileName, Options, Model) -> {ok, Model}

Compiles an XSD file (FileName), and adds the elements defined by this XSD to Model. The purpose is to add elements (namespaces) to a model that uses the XML Schema ‘any’ construct. Only elements that are part of the model will be part of the output of ‘parse()’! See the soap example for an example where this is used.

See compile_xsd() for a description of the options.

scan(XML, Model) -> {ok, Struct, Rest}

Equivalent to scan(XML, Model, []).

scan(XML, Model, Options) -> {ok, Struct, Rest}

XML     = [int()] or an encoded binary
Model   = the internal representation of the XSD, result of erlsom:compile()
Options = [Option]
Option  =  {continuation_function, Continuation_function,  Continuation_state} |
           {output_encoding, utf8}
Struct  = the translation of the XSD to an Erlang data structure
Rest    = list of characters that follow after the end of the XML document

Translates an XML document that conforms to the XSD to a structure of records.

Returns {ok, Struct, Rest} or {error, Error}.

Error has the following structure: [{exception, Exception}, {stack, Stack}, {received, Event}], where:

  • Exception is the exception that was thrown by the program
  • Stack is a representation of the 'stack' that is maintained by erlsom.
  • Event is the sax event that erlsom was processing when it ran into problems.

If specified, the continuation function is called whenever the end of the input XML document is reached before the parsing of the XML has finished. The function should have 1 argument (Continuation_state). It should return a tuple {NewData, NewState}, where NewData should be the next block of data (again a list of unicode code points or binary data - but the data type has to be the same for each invocation, and it has to match the data type of XML), and NewState is the information that is passed to the next invocation. Note: if the encoding of the document supports multi-byte characters (UTF8, UTF16) you don’t have to ensure that each block of data contains only complete characters - but in case of UTF16 encoding you do have to ensure that you return an odd number of bytes.

If the ‘output_encoding’ option is used, the text values will be binary encoded - but the values that are specified as integer in the XSD will still be integers.

scan_file(XMLFile, Model) -> {ok, Struct, Rest}

As scan, but taking its input from a file.

Equivalent to write(Struct, Model, []).

write(Struct, Model, Options) -> {ok, XML}

Struct  = a structure that represents an XML document
Model   = the internal representation of the XSD
Options = [Option]
Option  = {output, list | charlist | binary}
XML     = list() | charlist() | binary().

Translates a structure of records to an XML document. It is the inverse of erlsom:parse().

The output option can be used to specify the format of the output. The possible values are:

  • list: a list of Unicode code points (integers). This is the default.
  • charlist: a deep list of Unicode code points and UTF-8 encoded binaries.
  • binary: a UTF-8 encoded binary.
write_xsd_hrl_file(XSD, Output, Options) -> ok

XSD        = the name of the file that contains the XSD,
Output     = the name of the output file,
Options    = [ CompileOpt | HrlOption ]
HrlOption  = {attribute_hrl_prefix, string()}
CompileOpt = a list of Options as defined in [compile_xsd()](#compile_xsd).

Produces a set of record definitions for the types defined by the XSD. Note that the compile options have to be identical to those that are passed to compile_xsd().

The {attribute_hrl_prefix, string()} is used to specify a prefix for the record fields representing attributes. It defaults to "". E.g. if option {attribute_hrl_prefix, "attr_"} will be passed to this function, attribute id in the XML Schema will be represented by the field attr_id in the generated record. This is useful in the cases when a complex type have an attribute and an element with the same name.

parse_sax(XML, Acc0, EventFun, Options) -> {ok, AccOut, Rest}

Xml      = [int()], a list of Unicode code points
Acc0     = a term() that is passed to the EventFun. 
Eventfun = a fun() that is called by the parser whenever it has parsed a bit of the Xml input 
Options  = [Option]
Option   = {continuation_function, CState, CFunction} | {output_encoding, utf8} |
           {expand_entities, true | false} | {max_entity_depth, int() | infinity} |
           {max_entity_size, int() | infinity} | {max_nr_of_entities, int() | infinity} |
           {max_nr_of_entities, int() | infinity} | 
           {max_expanded_entity_size, int() | infinity}

AccOut   = a the result of the last invocation of EventFun. 
Rest     = list of characters that follow after the end of the XML document

EventFun should accept the following arguments:

  • Event, a tuple that describes the event, see the section on the Sax parser
  • AccIn , a term() - Acc0 for the first invocation, and the result from the previous invocation for each of the following invocations.

EventFun should return AccOut, a term() that will be passed back to the next invocation of EventFun.

CFunction should be a function that takes 2 arguments: Tail and State.

  • Tail is the (short) list of characters (or a short binary) that could not yet be parsed because it is (or might be) an incomplete token, or because an encoded character is not complete. Since this still has to be parsed, CFunction should include this in front of the next block of data.
  • State is information that is passed by the parser to the callback function transparently. This can be used to keep track of the location in the file etc.

CFunction returns {NewData, NewState}, where NewData is a list of characters/unicode code points/binary, and NewState the new value for the State. NewData has to be in the same type of encoding as the first part of the document.

Note: if the encoding of the document supports multi-byte characters (UTF8, UTF16) you don’t have to ensure that each block of data contains only complete characters - but in case of UTF16 you do have to ensure that you return an odd number of bytes.

The ‘output_encoding’ option determines the encoding of the 'character data': element values and attribute values. The only supported encoding at this moment is 'utf8'. The default is string().

There is a number of options to protect against malicious entities, such as the 'billion laughs' attack. An attempt has been made to use defaults that allow most "bona fide" use of entities, but block malicious cases. Depending on the situation it may make sense to select settings that are more or less restrictive.

  • expand_entities: if set to 'false', entities will not be expanded. Default: true
  • max_entity_depth: limits the level of nesting of entities. The default value is 2, which means that an entity can refer to 1 or more other entities, but none of those can contain entity references.
  • max_entity_size: limits the size of a single entity definition. Default: 2000
  • max_nr_of_entities: limits the number of entities that can be defined. Default: 100
  • max_expanded_entity_size: limits the total number of characters that can be introduced in an XML document by expansion of entities. Default: 10.000.000.

Equivalent to simple_form(XML, []).

simple_form(XML, Options) -> {ok, SimpleFormElement, Rest}

XML     = [int()] or an encoded binary
Options = [Option]
Option  =  {nameFun, NameFun} | {output_encoding, utf8}

SimpleFormElement = {Tag, Attributes, Content}, 
Rest    = list of characters that follow after the end of the XML document

Tag is a string (unless otherwise specified through the nameFun option, see below), Attributes = [{AttributeName, Value}], and Content is a list of SimpleFormElements and/or strings.

Namefun is a function with 3 arguments: Name, Namespace, Prefix. It should return a term. It is called for each tag and attribute name. The result will be used in the output. Default is Name if Namespace == undefined, and a string {Namespace}Name otherwise.

erlsom_lib:toUnicode(XML) -> DecodedXML

XML        = the XML in binary form.
DecodedXML  = the XML in the form of a list of Unicode code points.

Decodes the XML, see the section on character decoding.

erlsom_lib:find_xsd(Namespace, Location, Include_dirs, Include_list) -> {XSD, Prefix}

Namespace = string() | 'undefined' (taken from the XSD)
Location  = string() | 'undefined' (taken from the XSD)
Include_dirs: This is the value of the Include_dirs option if this option was passed to compile_xsd(); 'undefined' otherwise.
Include_list: This is the value of the Include_list option if this option was passed to compile_xsd(); 'undefined' otherwise.

The function erlsom_lib:find_xsd can be passed to compile_xsd as the value for the 'include_fun' option. It will attempt to get imported XSDs from the internet (if the import, include or redefine statement includes a ‘location’ attribute in the form of a URL).

If find_xsd cannot find the file on the internet, it will attempt to find the file using the standard function, see the description provided above with the compile_xsd function.

erlsom_lib:detect_encoding(Document) -> {Encoding, Binary}

Document = the XML document, either in binary form or as a list
Encoding = the encoding, as an atom
Binary   = the XML document in binary form.

Tries to detect the encoding. It looks at the first couple of bytes. If these bytes cannot give a definitive answer, it looks into the xml declaration.

Possible values for Encoding:

  • ucs4be
  • ucs4le
  • utf16be
  • utf16le
  • utf8
  • iso_8859_1

The second return value is identical to the input if the input was in binary form, and the translation to the binary form if the input was a list.

(the basis of this function was copied from xmerl_lib, but it was extended to look into the xml declaration).

erlsom_ucs:from_utf8(Data) -> {List, Tail}

Data = a block of data, either as a list of bytes or as a binary 
List = the input translated to a list of Unicode code points
Tail = remaining bytes at the end of the input (a list of bytes). 

These functions are based on the corresponding functions in xmerl_ucs, but they have been modified so that they can be used to translate blocks of data. The end of a block can be in the middle of a series of bytes that together correspond to 1 Unicode code point. The remaining bytes are returned, so that they can be put in front of the next block of data.

Note on performance: the functions work on lists, not binaries! If the input is a binary, this is translated to a list in a first step, since the functions are faster that way. If you are reading the xml document from a file, it is probably fastest to use pread() in such a way that it returns a list, and not a binary.

See the ‘continuation’ example for an example of how this can be used to deal with very large documents (or streams of data).

Identical to erlsom_ucs:from_utf8/1, but for utf16le.

Identical to erlsom_ucs:from_utf8/1, but for utf16be.

This function has been replaced by scan()! Please use scan().

This function has been replaced by scan_file()! Please use scan_file().

This function has been replaced by write_xsd_hrl()! Please use write_xsd_hrl().

Obsolete, use parse_sax().

erlsom_sax:parseDocument/4

Obsolete, use parse_sax().

This function has been replaced by compile_xsd()! Please use compile_xsd().

This function has been replaced by compile_xsd_file()! Please use compile_xsd_file().

This function has been replaced by add_xsd_file()! Please use add_xsd_file().