The IMSC reader (ttconv/imsc/reader.py
) converts IMSC 1.1 Text
Profile documents into the data model. The objective is to
preserve rendering fidelity but not necessarily structure, e.g. referential styling is flattened.
The IMSC reader accepts as input an XML document that conforms to the ElementTree XML
API and returns a model.ContentDocument
object.
import xml.etree.ElementTree as et
import ttconv.imsc.reader as imsc_reader
xml_doc = et.parse('src/test/resources/ttml/imsc-tests/imsc1/ttml/timing/BasicTiming007.ttml')
doc = imsc_reader.to_model(xml_doc)
# doc can then manipulated and written out using any of the writer modules
The input XML document is traversed using depth-first search (DFS). Each XML element encountered is processed using the from_xml()
method of the corresponding class in ttconv/imsc/elements.py
. For example,
ttconv.imsc.elements.PElement.from_xml()
is applied to each <p>
element. Since the data model is a subset of the IMSC 1.1 model,
additional parsing state is preserved across calls to from_xml()
by associating each parsed XML element in an instance of the
ttconv.imsc.elements.TTMLElement.ParsingContext
structure and its subclasses.
To improve code manageability, processing of TTML style and other attributes is conducted in ttconv/imsc/styles_properties.py
and
ttconv/imsc/attributes.py
, respectively. Each style property in ttconv/imsc/styles_properties.py
is mapped, as specified by the
model_prop
member, to a style property of the data model in ttconv/styles_properties.py
.
ttconv/imsc/namespaces.py
and ttconv/imsc/utils.py
contain common namespace declarations and utility functions, respectively.
Unit tests include parsing into the data model all of the IMSC test documents published by W3C.