English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
R, as a professional statistical tool, would be meaningless if it could only import and export data manually, so R supports batch data acquisition from mainstream table storage format files (such as CSV, Excel, XML, etc.).
CSV (Comma-Separated Values, CSV, sometimes also known as Character-Separated Values, because the separator can also be not a comma, is a very popular table storage file format, suitable for storing data of medium or small scale.
Since most software supports this file format, it is commonly used for data storage and interaction.
CSV is essentially text, with an extremely simple file format: data is saved line by line in text, with each record separated by delimiters into fields, and each record has the same field sequence.
The following is a simple sites.csv file (stored in the same directory as the test program):
id,name,url,likes 1,Google,www.google.com,111 2,w3codebox,www.oldtoolbag.com,222 3,Taobao,www.taobao.com,333
CSV uses commas to separate columns, and if the data contains commas, the entire data block should be enclosed in double quotes.
Note:Text containing non-English characters should pay attention to the encoding, as many computers commonly use UTF-8 Encoding, so I use UTF-8 Saved.
Note: The last line of the CSV file must have an empty line, otherwise, the program will display a warning message.
Warning message: In read.table(file = file, header = header, sep = sep, quote = quote, : Incomplete final line found by readTableHeader on 'sites.csv'
Next, we can use the read.csv() function to read the data from the CSV file:
data <- read.csv("sites.csv", encoding="UTF-8) print(data)
If the encoding attribute is not set, the read.csv function will default to the operating system's default text encoding for reading. If you are using a Windows Chinese version system and have not set the system's default encoding, the system's default encoding should be GBK. Therefore, please try to unify the text encoding as much as possible to prevent errors.
The output of the above code is:
id name name name name likes 1 1 Google www.google.com 111 2 2 w3codebox www.oldtoolbag.com 222 3 3 Taobao www.taobao.com 333
The read.csv() function returns a data frame, allowing us to easily perform statistical processing on the data. In the following example, we check the number of rows and columns:
data <- read.csv("sites.csv", encoding="UTF-8) print(is.data.frame(data)) # Check if it is a data frame print(ncol(data)) # Number of columns print(nrow(data)) # Number of rows
The output of the above code is:
[1] TRUE [1] 4 [1] 3
The following statistical box shows the likes field with the largest data:
data <- read.csv("sites.csv", encoding="UTF-8) # likes the largest data like <- max(data$likes) print(like)
The output of the above code is:
[1] 333
We can also specify search conditions, similar to the SQL WHERE clause, to query data, and the function required is subset().
The following example finds likes as 222 to the data:
data <- read.csv("sites.csv", encoding="UTF-8) # likes is 222 data retval <- subset(data, likes == 222) print(retval)
The output of the above code is:
id name name name name likes 2 2 w3codebox www.oldtoolbag.com 222
Note: Conditional statements use ==.
Multiple conditions use the & separator, the following example finds likes greater than 1 name is w3codebox data:
data <- read.csv("sites.csv", encoding="UTF-8) # likes greater than 1 name is w3codebox data retval <- subset(data, likes > 1 & name=="w3codebox) print(retval)
The output of the above code is:
id name name name name likes 2 2 w3codebox www.oldtoolbag.com 222
R language can be used write.csv() function to save data as CSV file.
Following the above example, we will save the data with likes as 222 data is saved to w3codebox.csv file:
data <- read.csv("sites.csv", encoding="UTF-8) # likes is 222 data retval <- subset(data, likes == 222) # Write to a new file write.csv(retval,"w3codebox.csv) newdata <- read.csv("w3codebox.csv) print(newdata)
The output of the above code is:
X id name name name name likes 1 2 2 w3codebox www.oldtoolbag.com 222
X from the dataset newper, can be removed by the parameter row.names = FALSE:
data <- read.csv("sites.csv", encoding="UTF-8) # likes is 222 data retval <- subset(data, likes == 222) # Write to a new file write.csv(retval,"w3codebox.csv", row.names = FALSE) newdata <- read.csv("w3codebox.csv) print(newdata)
The output of the above code is:
id name name name name likes 1 2 w3codebox www.oldtoolbag.com 222
After execution, we can see w3codebox.csv file was generated successfully.