English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Ruby XML, XSLT, and XPath Tutorial

What is XML?

XML refers to Extensible Markup Language (eXtensible Markup Language).

Extensible Markup Language, a subset of Standard Generalized Markup Language, is a markup language used to mark electronic files to have structure.

It can be used to mark data, define data types, and is a source language that allows users to define their own markup language. It is very suitable for web transmission, providing a unified method to describe and exchange structured data independent of applications or suppliers.

For more information, please see our XML tutorial

XML parser structure and API

There are mainly two types of XML parsers: DOM and SAX.

  • The SAX parser is based on event handling, which requires scanning the entire XML document from start to end. During the scanning process, every time a syntax structure is encountered, the event handler for this specific syntax structure is called, sending an event to the application.

  • DOM is the Document Object Model parser, which constructs the hierarchical syntax structure of the document, builds a DOM tree in memory, and identifies DOM tree nodes as objects. After the document is parsed, the entire DOM tree of the document will be placed in memory.

Parsing and creating XML in Ruby

REXML library can be used to parse XML documents in RUBY.

The REXML library is a Ruby XML toolkit, written in pure Ruby language, and complies with XML1.0 specification.

In Ruby1.8version and later, RUBY standard library will include REXML.

The path to the REXML library is: rexml/document

All methods and classes are encapsulated within a REXML module.

REXML parser has the following advantages over other parsers:

  • 100% written in Ruby.

  • Applicable to SAX and DOM parsers.

  • It is lightweight, less than2000 lines of code.

  • Easy-to-understand methods and classes.

  • Based on SAX2 API and complete XPath support.

  • Ruby installation is used without the need for a separate installation.

The following is an example of XML code, saved as movies.xml:

<collection shelf="New Arrivals">
<movie title="Enemy Behind">
   <type>War, Thriller</type>/type>
   <format>DVD</format>/format>
   <year>2003</year>
   <rating>PG</rating>/rating>
   <stars>10</stars>
   <description>Talk about a US-Japan war/description>
</movie>
<movie title="Transformers">
   <type>Anime, Science Fiction</type>/type>
   <format>DVD</format>/format>
   <year>1989</year>
   <rating>R</rating>/rating>
   <stars>8</stars>
   <description>A schientific fiction</description>/description>
</movie>
   <movie title="Trigun">
   <type>Anime, Action</type>/type>
   <format>DVD</format>/format>
   <episodes>4</episodes>
   <rating>PG</rating>/rating>
   <stars>10</stars>
   <description>Vash the Stampede!</description>/description>
</movie>
<movie title="Ishtar">
   <type>Comedy</type>/type>
   <format>VHS</format>/format>
   <rating>PG</rating>/rating>
   <stars>2</stars>
   <description>Viewable boredom</description>/description>
</movie>
</collection>

DOM parser

Let's parse the XML data first, and then we will introduce rexml/The document library, usually we can introduce REXML in the top-level namespace:

Online Example

#!/usr/bin/ruby -w
 
require 'rexml'/document'
include REXML
 
xmlfile = File.new("movies.xml")
xmldoc = Document.new(xmlfile)
 
# Get the root element
root = xmldoc.root
puts "Root element : " + root.attributes["shelf"]
 
root.attributes["shelf"]
xmldoc.elements.each("collection"/# The following will output movie titles 
   movie"){ + |e| puts "Movie Title: " 
}
 
e.attributes["title"]
xmldoc.elements.each("collection"/movie/# The following will output all movie types
   |e| puts "Movie Type: " + e.text 
}
 
# The following will output all movie descriptions
xmldoc.elements.each("collection"/movie/description) {
   |e| puts "Movie Description: " + e.text 
}

The output result of the above example is:

Root element: New Arrivals
Movie Title: Enemy Behind
Movie Title: Transformers
Movie Title: Trigun
Movie Title: Ishtar
Movie Type: War, Thriller
Movie Type: Anime, Science Fiction
Movie Type: Anime, Action
Movie Type: Comedy
Movie Description: Talk about a US-Japan war
Movie Description: A schientific fiction
Movie Description: Vash the Stampede!
Movie Description: Viewable boredom
SAX-like Parsing:

SAX parser

Processing the same data file: movies.xml, it is not recommended to parse a small file with SAX. Here is a simple example:

Online Example

#!/usr/bin/ruby -w
 
require 'rexml'/document'
require 'rexml'/streamlistener'
include REXML
 
 
class MyListener
  include REXML::StreamListener
  def tag_start(*args)
    puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}"
  end
 
  def text(data)
    return if data =~ /^\w*$/     # whitespace only
    abbrev = data[0..40] + (data.length > 40 ? "..." : "")
    puts "    text: #{abbrev.inspect}"
  end
end
 
list = MyListener.new
xmlfile = File.new("movies.xml")
Document.parse_stream(xmlfile, list)

The output result above is:

tag_start: "collection", {"shelf"=>"New Arrivals"}
tag_start: "movie", {"title"=>"Enemy Behind"}
tag_start: "type", {}
  text: "War, Thriller"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text: "Talk about a US-Japan war
tag_start: "movie", {"title"=>"Transformers"}
tag_start: "type", {}
  text: "Anime, Science Fiction"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text: "A schientific fiction"
tag_start: "movie", {"title"=>"Trigun"}
tag_start: "type", {}
  text: "Anime, Action"
tag_start: "format", {}
tag_start: "episodes", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text: "Vash the Stampede!"
tag_start: "movie", {"title"=>"Ishtar"}
tag_start: "type", {}
tag_start: "format", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text: "Viewable boredom"

XPath and Ruby

We can use XPath to view XML, XPath is a language for finding information in XML documents (see:XPath Tutorial)。

XPath, the XML Path Language, is a language used to determine the location of a part of an XML (subset of the Standard Generalized Markup Language) document. XPath is based on the tree structure of XML and provides the ability to find nodes in a data structure tree.

Ruby supports XPath through the REXML XPath class, which is based on tree analysis (document object model).

Online Example

#!/usr/bin/ruby -w
 
require 'rexml'/document'
include REXML
 
xmlfile = File.new("movies.xml")
xmldoc = Document.new(xmlfile)
 
# Information of the first movie
movie = XPath.first(xmldoc, ""//movie)
p movie
 
# Print all movie types
XPath.each(xmldoc, ""//type) {|e| puts e.text}
 
# Get all the types of movie formats, return an array
names = XPath.match(xmldoc, ""//format).map {|x| x.text}
p names

The output result of the above example is:

<movie title='Enemy Behind'> ... </>
War, Thriller
Anime, Science Fiction
Anime, Action
Comedy
["DVD", "DVD", "DVD", "VHS"]

XSLT and Ruby

There are two XSLT parsers in Ruby. The following is a brief description:

Ruby-Sablotron

This parser is written and maintained by Masayoshi Takahashi. It is mainly written for the Linux operating system and requires the following libraries:

  • Sablot

  • Iconv

  • Expat

You can find these libraries in Ruby-Sablotron Find these libraries.

XSLT4R

XSLT4R was written by Michael Neumann. XSLT4R is used for simple command-line interaction and can be used by third-party applications to convert XML documents.

XSLT4R requires XMLScan operation, which includes XSLT4R archive, it is a100% of Ruby's modules. These modules can be installed using the standard Ruby installation method (i.e., Ruby install.rb).

XSLT4R syntax format is as follows:

ruby xslt.rb stylesheet.xsl document.xml [arguments]

If you want to use XSLT in your application4R, you can introduce XSLT and input the parameters you need. Here is an example:

Online Example

require "xslt"
 
stylesheet = File.readlines("stylesheet.xsl").to_s
xml_doc = File.readlines('document.xml').to_s
arguments = {'image_dir' => ''}/....'}
 
sheet = XSLT::Stylesheet.new(stylesheet, arguments)
 
# output to StdOut
sheet.apply(xml_doc)
 
# output to 'str'
str = ""
sheet.output = [str]
sheet.apply(xml_doc)

More Information