SQL Operations in Pandas Basic Methods of Pandas

Descriptive Statistics in Pandas

Pandas Descriptive Statistics Operation Example

DataFrame is used for a large number of descriptive statistics, computations, and other related operations. Most of them are aggregations, such as sum(), mean(), but some aggregations (such as sumsum()) will produce objects of the same size. Generally, these methods use the axis parameter, like ndarray. {sum, std, ...} and can be specified by name or integer DataFrame − Index (axis=0, default), Column (axis=1)

Let's create a DataFrame and use this object for all operations in this chapter.

Instance

Example

　import　pandas　as　pd
　import　numpy　as　np
　#Create a series dictionary
　d　=　{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
　　　　'Lee','David','Gasper','Betina','Andres']),
　　　　'''Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
　　　　'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
　}
　#　Create a DataFrame
　df　=　pd.DataFrame(d)
　print(df)

Running Results:

　　　　　　　　Age　　Name　　　Rating
0　　　25　　　Tom　　　　　4.23
1　　　26　　　James　　　3.24
2　　　25　　　Ricky　　　3.98
3　　　23　　　Vin　　　　　2.56
4　　　30　　　Steve　　　3.20
5　　　29　　　Smith　　　4.60
6　　　23　　　Jack　　　　3.80
7　　　34　　　Lee　　　　　3.78
8　　　40　　　David　　　2.98
9　　　30　　　Gasper　　4.80
10　　51　　　Betina　　4.10
11　　46　　　Andres　　3.65

sum()

Return the sum of the values of the requested axis. By default, the axis is the index (axis=0)

Example

　import　pandas　as　pd
　import　numpy　as　np
　　
　# Create a Series dictionary
　d　=　{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
　　　　'Lee','David','Gasper','Betina','Andres']),
　　　　'''Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
　　　　'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
　}
　#Create a DataFrame
　df　=　pd.DataFrame(d)
　print(df.sum())

Running Results:

　　　　Age　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　382
Name　　　　　TomJamesRickyVinSteveSmithJackLeeDavidGasperBe...
Rating　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　44.92
dtype:　object

Each individual column is added with a string

axis=1

This syntax will output the following content.

Example

　import　pandas　as　pd
　import　numpy　as　np
　　
　#Create a series dictionary
　d　=　{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
　　　　'Lee','David','Gasper','Betina','Andres']),
　　　　'''Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
　　　　'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
　}
　　
　#Create a DataFrame
　df　=　pd.DataFrame(d)
　print(df.sum(1))

Running Results:

　　　　0　　　　29.23
1　　　　29.24
2　　　　28.98
3　　　　25.56
4　　　　33.20
5　　　　33.60
6　　　　26.80
7　　　　37.78
8　　　　42.98
9　　　　34.80
10　　　55.10
11　　　49.65
dtype:　float64

mean()

Returns the average value

Example

　import　pandas　as　pd
　import　numpy　as　np
　#Create a series dictionary
　d　=　{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
　　　　'Lee','David','Gasper','Betina','Andres']),
　　　　'''Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
　　　　'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
　}
　#Create a DataFrame
　df　=　pd.DataFrame(d)
　print(df.mean())

Running Results:

　　　　Age　　　　　　　31.833333
Rating　　　　　3.743333
dtype:　float64

std()

Returns the Bressel standard deviation of numerical columns.

Example

　import　pandas　as　pd
　import　numpy　as　np
　#Create a series dictionary
　d　=　{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
　　　　'Lee','David','Gasper','Betina','Andres']),
　　　　'''Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
　　　　'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
　}
　#Create a DataFrame
　df　=　pd.DataFrame(d)
　print(df.std())

Running Results:

　　　　Age　　　　　　　9.232682
Rating　　　　0.661628
dtype:　float64

Functions & Description

Now let's understand the functions under descriptive statistics in Python Pandas. The following table lists important functions:

Number	Method	Description
1	count()	Non-empty number
2	sum()	Total
3	mean()	Mean
4	median()	Median
5	mode()	Mode
6	std()	Standard deviation
7	min()	Minimum value
8	max()	Maximum value
9	abs()	Absolute value
10	prod()	Product
11	cumsum()	Cumulative sum
12	cumprod()	Cumulative product

Note: − Since DataFrame is a heterogeneous data structure, generic operations do not apply to all functions.

Functions such as sum(), cumsum() can be used for numerical and character (or) string data elements without any error. Although character sets are not commonly used, no exceptions will be thrown.

When a DataFrame contains character or string data, functions such as abs(), cumprod() will raise an exception because such operations cannot be performed.

Summarize data

Example

　　import　pandas　as　pd
　import　numpy　as　np
　#Create a series dictionary
　d　=　{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
　　　　'Lee','David','Gasper','Betina','Andres']),
　　　　'''Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
　　　　'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
　}
　#Create a DataFrame
　df　=　pd.DataFrame(d)
　print(df.describe())

Running Results:

　　　　　　　　　　　　　　　　Age　　　　　　　　　Rating
count　　　　12.000000　　　　　　12.000000
mean　　　　　31.833333　　　　　　　3.743333
std　　　　　　　9.232682　　　　　　　0.661628
min　　　　　　23.000000　　　　　　　2.560000
25%　　　　　　25.000000　　　　　　　3.230000
50%　　　　　　29.500000　　　　　　　3.790000
75%　　　　　　35.500000　　　　　　　4.132500
max　　　　　　51.000000　　　　　　　4.800000

This function provides the mean, std, and IQR values. And, the function does not include character columns and the given summary of numerical columns. 'include' is the parameter used to pass necessary information about which columns need to be considered when summarizing. The value list; by default, it is 'number'.

object − Summarize string columnsnumber − Summarize numerical columnsall − Summarize all columns together (it should not be treated as a list value)

Below we use the following statements in the program and execute and output:

Example

　　import　pandas　as　pd
　import　numpy　as　np
　#Create a series dictionary
　d　=　{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
　　　　'Lee','David','Gasper','Betina','Andres']),
　　　　'''Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
　　　　'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
　}
　#Create a DataFrame
　df　=　pd.DataFrame(d)
　print(df.describe(include=['object']))

Running Results:

　　　　　　　　　　　Name
count　　　　　　　12
unique　　　　　　12
top　　　　　　Ricky
freq　　　　　　　　　1

Below we use the following statements in the program and execute and output:

Example

　　import　pandas　as　pd
　import　numpy　as　np
　#Create a series dictionary
　d　=　{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
　　　　'Lee','David','Gasper','Betina','Andres']),
　　　　'''Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
　　　　'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
　}
　#Create a DataFrame
　df　=　pd.DataFrame(d)
　print(df.　describe(include='all'))

Running Results:

　　　　　　　　　　　Age　Name　Rating
count　　　12.000000　　　　　　　　12　　　　12.000000
unique　NaN　　　　　　　　12　　　　　　　　　　NaN
top　NaN　Ricky　NaN
freq　NaN　　　　　　　　　1　　　　　　　　　　NaN
mean　　　　31.833333　　　　　　　NaN　　　　　3.743333
std　　　　　　9.232682　　　　　　　NaN　0.661628
min　　　　　23.000000　NaN　　　　　2.560000
25%　　　　　25.000000　NaN　　　　　3.230000
50%　　　　　29.500000　NaN　　　　　3.790000
75%　　　　　35.500000　NaN　　　　　4.132500
max　　　　　51.000000　NaN　　　　　4.800000

SQL Operations in Pandas Basic Methods of Pandas

Pandas Tutorial

Descriptive Statistics in Pandas

Instance

sum()

axis=1

mean()

std()

Functions & Description

Summarize data