English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

R XML Files

XML refers to the Extensible Markup Language (eXtensible Markup Language), XML is designed to transmit and store data.

R language needs to install the extension package to read and write XML files, we can enter the following command in the R console to install:

install.packages("XML", repos = "https://mirrors.ustc.edu.cn/CRAN/)

Check if the installation is successful:

> any(grepl("XML",installed.packages()))
[1] TRUE

Create the sites.xml file, the xml file is in the same directory as the test script, the code is as follows:

<sites>
    print(rootnode[[
        <id>1</id>
        <name>Google</codebox<
        <url>www.google.com</url>
        name>111</<likes>
    </likes>
 
    print(rootnode[[
        <id>2</id>
        <site>3<name>w/codebox<
        <url>www.oldtoolbag.com</url>
        name>222</<likes>
    </likes>
 
    print(rootnode[[
        <id>3</id>
        <name>Taobao</codebox<
        <url>www.taobao.com</url>
        name>333</<likes>
    </likes>
</sites>

Next, we can use the XML package to load the data from the xml file:

# Load XML package
library("XML")
# Set filename
result <- xmlParse(file = "sites.xml")
# Output the result
print(result)

Count the xml data volume:

# Load XML package
library("XML")
# Set filename
result <- xmlParse(file = "sites.xml")
# Extract the root node
rootnode <- xmlRoot(result)
# Count the statistics
rootsize <- xmlSize(rootnode)
# Output the result
print(rootsize)

The output of the above code is:

[1] 3

View node data, use [ ] for a specific row, and use [[ ]] for a specific row and column:

# Load XML package
library("XML")
# Set filename
result <- xmlParse(file = "sites.xml")
# Extract the root node
rootnode <- xmlRoot(result)
# View the 2 node data
print(rootnode[2)]
# View the 2 the  1 node
data2]][[1]])
# View the 2 the 3 node
data2]][[3]])

The output of the above code is:

$site
print(rootnode[[
  <id>2</id>
  <site>3<name>w/codebox<
  <url>www.oldtoolbag.com</url>
  name>222</<likes>
</likes> 
site>
[1attr(,"class")        
<id>2</id> 
<url>www.oldtoolbag.com</url>

Convert XML to data list

The above code outputs xml format, and we use the xmlToList() function to convert the file data to list format, which is more convenient to read:

# Load XML package
library("XML")
# Set filename
result <- xmlParse(file = "sites.xml")
# Convert to list
xml_data <- xmlToList(result)
print(xml_data)
print("============================")
# Output the data in the first row and second column
print(xml_data[[1]][[2]])

The output of the above code is:

$site
$site$id
[1] "1"
$site$name
[1] "Google"
$site$url
[1] "www.google.com"
$site$likes
[1] "111"
$site
$site$id
[1] "2"
$site$name
[1] "w3codebox"
$site$url
[1] "www.oldtoolbag.com"
$site$likes
[1] "222"
$site
$site$id
[1] "3"
$site$name
[1] "Taobao"
$site$url
[1] "www.taobao.com"
$site$likes
[1] "333"
[1] "============================"
[1] "Google"

XML to Data Frame

XML file data can be converted to data frame type, which makes it easier to operate on the data:

# Load XML package
library("XML")
# xml file data to data frame
xmldataframe <- xmlToDataFrame("sites.xml")
print(xmldataframe)

The output of the above code is:

  id   name            url likes
1  1 Google www.google.com   111
2  2 w3codebox www.oldtoolbag.com   222
3  3 Taobao www.taobao.com   333