English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

R CSV Files

R, as a professional statistical tool, would be meaningless if it could only import and export data manually, so R supports batch data acquisition from mainstream table storage format files (such as CSV, Excel, XML, etc.).

CSV table interaction

CSV (Comma-Separated Values, CSV, sometimes also known as Character-Separated Values, because the separator can also be not a comma, is a very popular table storage file format, suitable for storing data of medium or small scale.

Since most software supports this file format, it is commonly used for data storage and interaction.

CSV is essentially text, with an extremely simple file format: data is saved line by line in text, with each record separated by delimiters into fields, and each record has the same field sequence.

The following is a simple sites.csv file (stored in the same directory as the test program):

id,name,url,likes
1,Google,www.google.com,111
2,w3codebox,www.oldtoolbag.com,222
3,Taobao,www.taobao.com,333

CSV uses commas to separate columns, and if the data contains commas, the entire data block should be enclosed in double quotes.

Note:Text containing non-English characters should pay attention to the encoding, as many computers commonly use UTF-8 Encoding, so I use UTF-8 Saved.

Note: The last line of the CSV file must have an empty line, otherwise, the program will display a warning message.

Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, :
  Incomplete final line found by readTableHeader on 'sites.csv'

Read CSV file

Next, we can use the read.csv() function to read the data from the CSV file:

data <- read.csv("sites.csv", encoding="UTF-8)
print(data)

If the encoding attribute is not set, the read.csv function will default to the operating system's default text encoding for reading. If you are using a Windows Chinese version system and have not set the system's default encoding, the system's default encoding should be GBK. Therefore, please try to unify the text encoding as much as possible to prevent errors.

The output of the above code is:

  id name name name name likes
1  1 Google www.google.com   111
2  2 w3codebox www.oldtoolbag.com   222
3  3 Taobao www.taobao.com   333

The read.csv() function returns a data frame, allowing us to easily perform statistical processing on the data. In the following example, we check the number of rows and columns:

data <- read.csv("sites.csv", encoding="UTF-8)
print(is.data.frame(data)) # Check if it is a data frame
print(ncol(data)) # Number of columns
print(nrow(data)) # Number of rows

The output of the above code is:

[1] TRUE
[1] 4
[1] 3

The following statistical box shows the likes field with the largest data:

data <- read.csv("sites.csv", encoding="UTF-8)
# likes the largest data
like <- max(data$likes)
print(like)

The output of the above code is:

[1] 333

We can also specify search conditions, similar to the SQL WHERE clause, to query data, and the function required is subset().

The following example finds likes as 222 to the data:

data <- read.csv("sites.csv", encoding="UTF-8)
# likes is 222 data
retval <- subset(data, likes == 222)
print(retval)

The output of the above code is:

  id name name name name likes
2  2 w3codebox www.oldtoolbag.com   222

Note: Conditional statements use ==.

Multiple conditions use the & separator, the following example finds likes greater than 1 name is w3codebox data:

data <- read.csv("sites.csv", encoding="UTF-8)
# likes greater than 1 name is w3codebox data
retval <- subset(data, likes > 1 & name=="w3codebox)
print(retval)

The output of the above code is:

  id name name name name likes
2  2 w3codebox www.oldtoolbag.com   222

to save as CSV file

R language can be used write.csv()  function to save data as CSV file.

Following the above example, we will save the data with likes as 222 data is saved to w3codebox.csv file:

data <- read.csv("sites.csv", encoding="UTF-8)
# likes is 222 data
retval <- subset(data, likes == 222)
# Write to a new file
write.csv(retval,"w3codebox.csv)
newdata <- read.csv("w3codebox.csv)
print(newdata)

The output of the above code is:

 X id name name name name likes
1 2  2 w3codebox www.oldtoolbag.com   222

X from the dataset newper, can be removed by the parameter row.names = FALSE:

data <- read.csv("sites.csv", encoding="UTF-8)
# likes is 222 data
retval <- subset(data, likes == 222)
# Write to a new file
write.csv(retval,"w3codebox.csv", row.names = FALSE)
newdata <- read.csv("w3codebox.csv)
print(newdata)

The output of the above code is:

  id name name name name likes
1  2 w3codebox www.oldtoolbag.com   222

After execution, we can see w3codebox.csv file was generated successfully.