Thursday, June 28, 2012

Perl One Liner for dumping xml files

     Whenever we work with an xml file, the first thing we like to find out is how the xml is organized.  Once we get the structure of the file, it will be easier to write code to process the xml data.  Perl developers popularly use the Dumper function from the Data::Dumper module to achieve this inside their code.  But if you have the xml file readily available and you want to take a peak into how the data is organized within the xml, you don’t have to start writing a program, instead run the following one liner with the xml file name as the last argument.

perl -MData::Dumper -MXML::Simple -e 'print Dumper(new XML::Simple->XMLin($ARGV[0]));' XML_FILE_NAME

-M - Followed by a perl module name make the module available for the perl command.  So here we are loading modules Data::Dumper and XML::Simple
-e  - Any quoted string following this will be executed as perl code.

Now examining the executable code itself,

$ARGV[0] is the first command line argument passed to the perl command after all the command "options"

XML::Simple module provide the XMLin function, which can take an xml file as argument and return a reference to a data structure containing the xml data in a more structured and accessible format.

Dumper function take any perl variable or variable reference as an argument and returns it as a structured printable string.

So the executable code,

 * dynamically calls the XMLin function made available by the XML::Simple module and
 *  passes the xml file name as its argument ( which was provided as the command line argument to perl)
 * XMLin return the xml in a data structure reference, which is
 * passed as an argument to Dumper function made available by the Dta::Dumper module, which
 * returns a printable string representation of the above reference, and
 * the print command just prints it out.

A small example

safeer@penguinpower:~$ perl -MData::Dumper -MXML::Simple -e 'print Dumper(new XML::Simple->XMLin($ARGV[0]));'  dummy.xml  
$VAR1 = { 
         'language' => { 
               'perl' => { 
                   'version' => '5.8', 
                   'content' => 'Practical Extraction and Report Language'          
                            },
          'name' => 'perl'  
                         } 
        }; 

To make life easier I have wrapped it in a bash  function and added to my bashrc

safeer@penguinpower:~$ type xmlDump 
xmlDump is a function 
xmlDump () {     
XML=${1?You should provide an xml file};   
perl -MData::Dumper -MXML::Simple -e 'print Dumper(new XML::Simple->XMLin($ARGV[0]));' $XML 
}