my $schema = XML::Compile::Schema->new(...); my $code = $schema->compile(READER => ...);
The translator understands schemas, but does not encode that into actions. This module implements those actions to translate from XML into a (nested) Perl HASH structure.
If you want to collect information from the XML structure, which is
permitted by any
and anyAttribute
specifications in the schema,
you have to implement that yourself. The problem is XML::Compile
has less knowledge than you about the possible data.
By default, the anyAttribute
specification is ignored. When TAKE_ALL
is given, all attributes which are fulfilling the name-space requirement
added to the returned data-structure. As key, the absolute element name
will be used, with as value the related unparsed XML element.
In the current implementation, if an explicit attribute is also covered by the name-spaces permitted by the anyAttribute definition, then it will also appear in that list (and hence the handler will be called as well).
Use XML::Compile::Schema::compile(anyAttribute) to write your own handler, to influence the behavior. The handler will be called for each attribute, and you must return list of pairs of derived information. When the returned is empty, the attribute data is lost. The value may be a complex structure.
Say your schema looks like this:
<schema targetNamespace="http://mine" xmlns:me="http://mine" ...> <element name="el"> <complexType> <attribute name="a" type="xs:int" /> <anyAttribute namespace="##targetNamespace" processContents="lax"> </complexType> </element> <simpleType name="non-empty"> <restriction base="NCName" /> </simpleType> </schema>
Then, in an application, you write:
my $r = $schema->compile ( READER => pack_type('http://mine', 'el') , anyAttribute => 'ALL' ); # or lazy: READER => '{http://mine}el' my $h = $r->( <<'__XML' ); <el xmlns:me="http://mine"> <a>42</a> <b type="me:non-empty"> everything </b> </el> __XML use Data::Dumper 'Dumper'; print Dumper $h; __XML__
The output is something like
$VAR1 = { a => 42 , '{http://mine}a' => ... # XML::LibXML::Node with <a>42</a> , '{http://mine}b' => ... # XML::LibXML::Node with <b>everything</b> };
You can improve the reader with a callback. When you know that the
extra attribute is always of type non-empty
, then you can do
my $read = $schema->compile ( READER => '{http://mine}el' , anyAttribute => \&filter ); my $anyAttRead = $schema->compile ( READER => '{http://mine}non-empty' ); sub filter($$$$) { my ($fqn, $xml, $path, $translator) = @_; return () if $fqn ne '{http://mine}b'; (b => $anyAttRead->($xml)); } my $h = $r->( see above ); print Dumper $h;
Which will result in
$VAR1 = { a => 42 , b => 'everything' };
The filter will be called twice, but return nothing in the first case. You can implement any kind of complex processing in the filter.
By default, the any
definition in a schema will ignore all elements
from the container which are not used. Also in this case TAKE_ALL
is required to produce any
results. SKIP_ALL
will ignore all
results, although this are being processed for validation needs.
The minOccurs
and maxOccurs
of any
are ignored: the amount of
elements is always unbounded. Therefore, you will get an array of
elements back per type.
[available since 0.86]
ComplexType and ComplexContent components can be declared with the
<mixed="true"
> attribute. This implies that text is not limited
to the content of containers, but may also be used inbetween elements.
Usually, you will only find ignorable white-space between elements.
In this example, the a
container is marked to be mixed:
<a id="5"> before <b>2</b> after </a>
Often the "mixed" option is bending one of both ways: either the element is needed as text, or the element should be parsed and the text ignored. The reader has various options to avoid the need of processing raw XML::LibXML nodes.
With XML::Compile::Schema::compile(mixed_elements) set to
$r = { id => 5, _ => $xmlnode };
$r = { id => 5, _ => ' before 2 after '};
$r = { id => 5, b => 2 };
$r = $xmlnode;
$r = '<a id="5"> before <b>2</b> after </a>';
When some of your mixed elements need different behavior from other elements, then you have to go play with the normal hooks in specific cases.
The before
hooks receives an XML::LibXML::Node object and
the path string. It must return a new (or same) XML node which
will be used from then on. You probably can best modify a node
clone, not the original as provided by the user. When undef
is returned, the whole node will disappear.
This hook offers a predefined PRINT_PATH
.
$schema->addHook(path => qr/./, before => 'PRINT_PATH');
Your replace
hook should return a list of key-value pairs. To
produce it, it will get the XML::LibXML::Element, the translator settings
as HASH, the path, and the localname.
This hook has a predefined SKIP
, which will not process the
found element, but simply return the string "SKIPPED" as value.
This way, a whole tree of unneeded translations can be avoided.
Sometimes, the Schema spec is such a mess, that XML::Compile cannot automatically translate it. I have seen cases where confusion over name-spaces is created: a choice between three elements with the same name but different types. Well, in such case you may use XML::LibXML::Simple to translate a part of your tree. Simply
use XML::LibXML::Simple qw/XMLin/; $schema->addHook ( type => ...bad-type-definition... , replace => sub { my ($xml, $args, $path, $local) = @_; ($local => XMLin($xml, ...)); } );
The data is collect, and passed as second argument after the XML node.
The third argument is the path. Be careful that the collected data
might be a SCALAR (for simpleType). Return a HASH or a SCALAR. undef
may work, unless it is the value of a required element you throw awy.
This hook also offers a predefined PRINT_PATH
. Besides, it
has XML_NODE
, ELEMENT_ORDER
, and ATTRIBUTE_ORDER
, which will
result in additional fields in the HASH, respectively containing the
CODE which was processed, the element names, and the attribute names.
The keys start with an underscore _
.
In a typemap, a relation between an XML element type and a Perl class (or object) is made. Each translator back-end will implement this a little differently. This section is about how the reader handles typemaps.
Usually, an XML type will be mapped on a Perl class. The Perl class
implements the fromXML
method as constructor.
$schema->typemap($sometype => 'My::Perl::Class'); package My::Perl::Class; ... sub fromXML { my ($class, $data, $xmltype) = @_; my $self = $class->new($data); ... $self; }
Your method returns the data which will be included in the result tree
of the reader. You may return an object, the unmodified $data
, or
undef
. When undef
is returned, this may fail the schema parser
when the data element is required.
In the simpelest implementation, the class stores its data exactly as the XML structure:
package My::Perl::Class; sub fromXML { my ($class, $data, $xmltype) = @_; bless $data, $class; } # The same, even shorter: sub fromXML { bless $_[1], $_[0] }
An other option is to implement an object factory: one object which creates
other objects. In this case, the $xmltype
parameter can come of use,
to have one object spawning many different other objects.
my $object = My::Perl::Class->new(...); $schema->typemap($sometype => $object); package My::Perl::Class; sub fromXML { my ($object, $xmltype, $data) = @_; return Some::Other::Class->new($data); }
This object factory may be a very simple solution when you map XML onto
objects which are not under your control; where there is not way to
add the fromXML
method.
The light version of an object factory works with CODE references.
$schema->typemap($t1 => \&myhandler); sub myhandler { my ($backend, $data, $type) = @_; return My::Perl::Class->new($data) if $backend eq 'READER'; $data; } # shorter $schema->typemap($t1 => sub {My::Perl::Class->new($_[1])} );
Internally, the typemap is simply translated into an "after" hook for the
specific type. After the data was processed via the usual mechanism,
the hook will call method fromXML
on the class or object you specified
with the data which was read. You may still use "before" and "replace"
hooks, if you need them.
Syntactic sugar:
$schema->typemap($t1 => 'My::Package'); $schema->typemap($t2 => $object);
is comparible to
$schema->typemap($t1 => sub {My::Package->fromXML(@_)}); $schema->typemap($t2 => sub {$object->fromXML(@_)} );
with some extra checks.