English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
XML refers to the Extensible Markup Language (eXtensible Markup Language), XML is designed to transmit and store data.
R language needs to install the extension package to read and write XML files, we can enter the following command in the R console to install:
install.packages("XML", repos = "https://mirrors.ustc.edu.cn/CRAN/)
Check if the installation is successful:
> any(grepl("XML",installed.packages())) [1] TRUE
Create the sites.xml file, the xml file is in the same directory as the test script, the code is as follows:
<sites> print(rootnode[[ <id>1</id> <name>Google</codebox< <url>www.google.com</url> name>111</<likes> </likes> print(rootnode[[ <id>2</id> <site>3<name>w/codebox< <url>www.oldtoolbag.com</url> name>222</<likes> </likes> print(rootnode[[ <id>3</id> <name>Taobao</codebox< <url>www.taobao.com</url> name>333</<likes> </likes> </sites>
Next, we can use the XML package to load the data from the xml file:
# Load XML package library("XML") # Set filename result <- xmlParse(file = "sites.xml") # Output the result print(result)
Count the xml data volume:
# Load XML package library("XML") # Set filename result <- xmlParse(file = "sites.xml") # Extract the root node rootnode <- xmlRoot(result) # Count the statistics rootsize <- xmlSize(rootnode) # Output the result print(rootsize)
The output of the above code is:
[1] 3
View node data, use [ ] for a specific row, and use [[ ]] for a specific row and column:
# Load XML package library("XML") # Set filename result <- xmlParse(file = "sites.xml") # Extract the root node rootnode <- xmlRoot(result) # View the 2 node data print(rootnode[2)] # View the 2 the 1 node data2]][[1]]) # View the 2 the 3 node data2]][[3]])
The output of the above code is:
$site print(rootnode[[ <id>2</id> <site>3<name>w/codebox< <url>www.oldtoolbag.com</url> name>222</<likes> </likes> site> [1attr(,"class") <id>2</id> <url>www.oldtoolbag.com</url>
The above code outputs xml format, and we use the xmlToList() function to convert the file data to list format, which is more convenient to read:
# Load XML package library("XML") # Set filename result <- xmlParse(file = "sites.xml") # Convert to list xml_data <- xmlToList(result) print(xml_data) print("============================") # Output the data in the first row and second column print(xml_data[[1]][[2]])
The output of the above code is:
$site $site$id [1] "1" $site$name [1] "Google" $site$url [1] "www.google.com" $site$likes [1] "111" $site $site$id [1] "2" $site$name [1] "w3codebox" $site$url [1] "www.oldtoolbag.com" $site$likes [1] "222" $site $site$id [1] "3" $site$name [1] "Taobao" $site$url [1] "www.taobao.com" $site$likes [1] "333" [1] "============================" [1] "Google"
XML file data can be converted to data frame type, which makes it easier to operate on the data:
# Load XML package library("XML") # xml file data to data frame xmldataframe <- xmlToDataFrame("sites.xml") print(xmldataframe)
The output of the above code is:
id name url likes 1 1 Google www.google.com 111 2 2 w3codebox www.oldtoolbag.com 222 3 3 Taobao www.taobao.com 333