English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

R Data Types

Data type refers to a broad system used to declare different types of variables or functions.

The type of a variable determines the space occupied by the variable and how to interpret the stored bit pattern.

The most basic data types in R language mainly include three types:

  • Numeric

  • Logical

  • Text

There are mainly two types of numeric constants:

General type123    -0.125
Scientific notation1.23e2    -1.25E-1

Logical type is often called Boolean in many other programming languages, and the constant values are only TRUE and FALSE.

注意:R language is case-sensitive, true or True cannot represent TRUE.

The most intuitive data type is the text type. Text is what is commonly called a string (String) in other languages, and constants are enclosed in double quotes. In R language, text constants can be enclosed in either single or double quotes, for example:

> 'w3codebox' == "w3codebox"
[1] TRUE

Variable definition in R language is not like the syntax rules in some strongly typed languages, where it is necessary to set names and data types for variables specifically. Every time an assignment operator is used in R, it actually defines a new variable:

a = 1
b <- TRUE
b = "abc"

Categorized by object type, they are as follows: 6 Types (which will be introduced in detail later):

Vector

Vectors (Vector) are often provided in the standard libraries of specialized programming languages such as Java, Rust, and C#, because vectors are indispensable tools in mathematical operations - the most common vectors are two-dimensional vectors, which are inevitably used in the plane coordinate system.

From the perspective of data structure, a vector is a linear table and can be regarded as an array.

In R language, vectors as a type make vector operations easier:

a = c(3, 4)
> b = c(5, 0)
> a + b
[1] 8 4
>

c() is a function for creating vectors.

Here, two two-dimensional vectors are added together to get a new two-dimensional vector (8, 4If a two-dimensional vector and a three-dimensional vector are operated on, they will lose their mathematical significance, although the operation will not stop, but a warning will be issued.

我建议大家从习惯上杜绝这种情况的出现。

向量中的每一个元素可以通过下标单独取出:

a = c(10, 20, 30, 40, 50)
In R, the 'index' does not represent an offset, but represents the number, that is, from2]
[1] 20

注意:I suggest that you eliminate this situation from habit. 1 Each element in the vector can be extracted individually through the index:

Note:

In R, the 'index' does not represent an offset, but represents the number, that is, from1> range(4Start! 1 to 4 R can also easily extract a part of the vector: 1 )] Remove the 4 And the
[1] 10 20 30 40
)] Remove the1, 3, 5> a[ 1, 3, 5 And the
[1] 10 30 50
)] Remove the-1, -5> a[c( 1 )] Remove the 5 And the
[1] 20 30 40

Item

These three methods of partial extraction are the most commonly used.

) >1.1, 1.2, 1.3) - 0.5
[1Vectors support scalar calculations:6 0.7 0.8
a = c(1,2)
0. 2
[1] 1 4

> a ^

The commonly used mathematical operation functions such as sqrt, exp, etc., can also be used for scalar operations on vectors.

"Vector" as a linear table structure should have some common linear table processing functions, and R indeed has these functions:

a = c(1, 3, 5, 2, 4, 6)
Vector sorting:
[1] 1 2 3 4 5 6
> sort(a)
[1] 6 4 2 5 3 1
> rev(a)
[1] 1 4 2 5 3 6
> order(a)
[1] 1 2 3 4 5 6

> a[order(a)]

The order() function returns a vector of indices after sorting.

Vector statistics

R has a very complete set of statistical functions:Function name
Meaningsum
Summean
Meanvar
Variancesd
Standard deviationmin
Minimum valuemax
Maximum valuerange

Value range (maximum and minimum of a two-dimensional vector)

Vector statistics examples:1> range(5)
[1] 15
> sum(1> range(5)
[1] 1.581139
> sd(1> range(5)
[1] 1 5

:

Vector generation Vector generation can be done using c()

If you want to generate an arithmetic sequence with gaps, you can use the seq function or the min:max operator to generate a continuous sequence. Function:

> seq(1, 9, 2)
[1] 1 3 5 7 9

seq can also generate an arithmetic sequence from m to n, just specify m, n, and the length of the sequence:

> seq(0, 1, length.out=3)
[10.0 0.5 1.0

rep means repeat (repetition) and can be used to generate a sequence of repeated numbers:

> rep(0, 5)
[10 0 0 0 0

NA and NULL are often used in vectors. Here is an introduction to these two terms and their differences:

  • NA represents 'missing', while NULL represents 'non-existent'.

  • NA represents a missing value as a placeholder, indicating that there is no value here, but the position exists.

  • NULL represents the absence of data.

Example explanation:

> length(c(NA, NA, NULL))
[1] 2
> c(NA, NA, NULL, NA)
[1NA, NA, NA

It is obvious that NULL has no meaning in a vector.

Logical

Logical vectors are mainly used for logical operations on vectors, such as:

) >1, 2, 3) > 2
[1] FALSE FALSE TRUE

The which function is a very common logical vector processing function, which can be used to filter the indices of the data we need:

a = c(1, 2, 3)
b = a > 2
print(b)
[1] FALSE FALSE TRUE
which(b)
[1] 3

For example, we need to filter out numbers greater than or equal to 60 and less than 70 The data:

vector = c(10, 40, 78, 64, 53, 62, 69, 70)
print(vector[which(vector >= 60 & vector < 70)])
[1] 64 62 69

Similar functions include all and any:

all(c(TRUE, TRUE, TRUE))
[1] TRUE
all(c(TRUE, TRUE, FALSE))
[1] FALSE
any(c(TRUE, FALSE, FALSE))
[1] TRUE
any(c(FALSE, FALSE, FALSE))
[1] FALSE

all() is used to check if all elements of a logical vector are TRUE, any() is used to check if a logical vector contains TRUE.

String

The data type of string is not complex in itself, here we focus on introducing the string operation functions:

toupper("w3codebox")  # Convert to uppercase
[1] "w3codebox"
tolower("w3codebox")  # Convert to lowercase
[1] "w3codebox"
nchar("中文", type="bytes")  # Byte length count
[1] 4
nchar("中文", type="char")  # Total character count
[1] 2
substr("123456789", 1, 5)  # Cut the string, from 1 to 5
[1]"12345"
substring("1234567890", 5)  # Cut the string, from 5 to the end
[1]"567890"
as.numeric("12)  # Convert string to number
[1] 12
as.character(12.34")  # Convert numbers to strings
[1]"12.34"
strsplit("2019;10;1",",";")  # Split the string by delimiter
[[1]]
[1]"2019"  "10"    "1"
gsub("/","-","2019/10/1")  # Replace the string
[1]"2019-10-1"

on a Windows computer, using the GBK encoding standard, so a Chinese character is two bytes, if in UTF-8 on an encoded computer, the byte length of a single Chinese character should be 3.

R supports regular expressions in Perl format:

gsub("[[:alpha:]]+", "$, "Two words")
[1] "$ $"

More string content can be found at:R Language String Introduction.

Matrix

R language provides matrix types for the study of linear algebra, this data structure is very similar to two-dimensional arrays in other languages, but R provides language-level matrix operation support.

First let's see the generation of the matrix:

> vector=c(1, 2, 3, 4, 5, 6)
> matrix(vector, 2, 3)
     [1],2],3]
[1,]    1    3    5
[2,]    2    4    6

The initialization content of a matrix is passed by a vector, which also expresses how many rows and columns the matrix has.

Values in a vector will be filled into the matrix column by column. If you want to fill it by row, you need to specify the byrow attribute:

> matrix(vector, 2, 3, byrow=TRUE)
     [1],2],3]
[1,]    1    2    3
[2,]    4    5    6

Each value in a matrix can be accessed directly:

> m1 = matrix(vector, 2, 3, byrow=TRUE)
> m1[1,1] # # 1 Row # 1 Column 
[1] 1
> m1[1,3] # # 1 Row # 3 Column
[1] 3

Each column and row of a matrix in R can be named, and this process is completed through a batch of string vectors:

> colnames(m1) = c("x", "y", "z")
> rownames(m1) = c("a", "b")
> m1
  x y z
a 1 2 3
b 4 5 6
> m1["a", ]
x y z 
1 2 3

Matrix arithmetic operations are basically consistent with vector operations, and can be performed with scalars or corresponding positions of matrices of the same size.

Matrix multiplication operation:

> m1 = matrix(c(1, 2), 1, 2)
> m2 = matrix(c(3, 4), 2, 1)
> m1 %*% m2
     [1]
[1,]   11

Inverse matrix:

> A = matrix(c(1, 3, 2, 4), 2, 2)
> solve(A)
     [1],2]
[1,] -2.0  1.0
[2,]  1.5 -0.5

The apply() function can treat each row or column of a matrix as a vector for operations:

> A = matrix(c(1, 3, 2, 4), 2, 2))
     [1],2]
[1,]    1    2
[2,]    3    4
> apply(A, 1, sum)  # The second parameter is 1 Row operations, using the sum() function
[1] 3 7
> apply(A, 2, sum)  # The second parameter is 2 Column operations
[1] 4 6

More matrix content can be found at:R Matrices.