Parsing XML with OOoBasic

OpenOffice.org is loaded with a full IDE and a language that even if it looks like a toy language. This weekend I have been reviweing a lot of code on OOoBasic and found that OOoBasic is a powerful script. One of the things that show the power of a very high level OOoBasic is parsing an XML file. Since OOo is made from XML it seems glorious that OOo could autoconfigure itself.

First is the powerful UNO framework which is lives in the inner pieces of OpenOffice.org. The UNO interface is conformed by interfaces, services and methods. The cool thing is the wide array of interfaces that OOoBasic can use and manipulate.

OOoBasic can parse XML on different ways, from SAX which is a smaller and simpler stream parser to a DOM parsing which will be a more indepth parsing based on the Document Object Model. This are both on the XML Module with hundred of tools that will be able to configure and reconfigure the code.

So here is the code that I was working with. First I needed to get an XML file, the file was a simple employee document.

<Employees>   <Employee id="101">       <Name>          <First>John</First>          <Last>Smith</Last>       </Name>       <Phone type="Home">785-555-1234</Phone>   </Employee></Employees>

Then here is the first stage of the code, we basically load the XML by the first interface which is the one that deals with external file manipulation:

Sub Main   cXmlFile = "/home/user/tmp/test.xml"      cXmlUrl = ConvertToURL( cXmlFile )      ReadXmlFromUrl( cXmlUrl )End Sub

We first create the ConverToURL function that will basically make the path to the file get used like a URL and then execute the function ReadXmlFromUrl that we will show next:

Sub ReadXmlFromUrl( cUrl )   oSFA = createUnoService( "com.sun.star.ucb.SimpleFileAccess" )   oInputStream = oSFA.openFileRead( cUrl )   ReadXmlFromInputStream( oInputStream )   oInputStream.closeInput()End Sub

This function use the SimplefileAccess to generate a Service using the createUnoSerive using the interface from the API. Then we will execute one of the methods called openFileRead this will get the file and to a variable and then implement the ReadXmlFromInputStream. Finally we close the the file using closeUput.

The next function is the ReadXmlFromInputStream, this is the one in charge of reading the XML.

Sub ReadXmlFromInputStream( oInputStream )   oSaxParser = createUnoService( "com.sun.star.xml.sax.Parser" )   oDocEventsHandler = CreateDocumentHandler()   oSaxParser.setDocumentHandler( oDocEventsHandler )   oInputSource = createUnoStruct( "com.sun.star.xml.sax.InputSource" )   With oInputSource      .aInputStream = oInputStream    End With   oSaxParser.parseStream( oInputSource )End Sub

This is the second function that is supposed to read the XML and will execute the parser itself. First we call the Parser service into a variable called oSaxParser. Then we have the CreateDocumentHandler then the parser will get the setDocumentHandler function.

Private goLocator As ObjectPrivate glLocatorSet As Boolean

We build an object as goLocator and make it as a boolean object, we later assign it to false under the DocumentHandler. We need to create the service for XDocumentHandler first.

Function CreateDocumentHandler()   oDocHandler = CreateUnoListener( "DocHandler_",_                                    "com.sun.star.xml.sax.XDocumentHandler" )   glLocatorSet = False   CreateDocumentHandler() = oDocHandlerEnd Function

Finally we have a series of functions where we specified the DocumentHandler to print out on the different elements of the XML. By default I comment all this handlers except for the character which is the one that specified the content. Unfortunately print will not just report the visible content such as John but all the invisible characters such as spaces, end of line and tab keys..

Sub DocHandler_startDocument()'   Print "Start document"End SubSub DocHandler_endDocument()'   Print "End document"End SubSub DocHandler_startElement( cName As String, oAttributes As _                             com.sun.star.xml.sax.XAttributeList )'    Print cName'Print oAttributes.LengthEnd SubSub DocHandler_endElement( cName As String )'   Print "End element", cNameEnd SubSub DocHandler_characters( cChars As String )Print "Contenido:",cCharsEnd Sub Sub DocHandler_ignorableWhitespace( cWhitespace As String )'Print cWhitespaceEnd SubSub DocHandler_processingInstruction( cTarget As String, cData As String )End Sub

Sub DocHandler_setDocumentLocator( oLocator As com.sun.star.xml.sax.XLocator )   goLocator = oLocator   glLocatorSet = TrueEnd Sub

Debo admitir que el proceso no es muy claro todavia pero haciendo una revision se puede ver que queremos 3 cosas:

  1. Invocar el servicio de parseo de XML
  2. Enviar nuestro llamado a una ventana dentro de OOo
Advertisements