English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
Pandas Descriptive Statistics Operation Example
DataFrame is used for a large number of descriptive statistics, computations, and other related operations. Most of them are aggregations, such as sum(), mean(), but some aggregations (such as sumsum()) will produce objects of the same size. Generally, these methods use the axis parameter, like ndarray. {sum, std, ...} and can be specified by name or integer DataFrame − Index (axis=0, default), Column (axis=1)
import pandas as pd import numpy as np #Create a series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), '''Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65]) } # Create a DataFrame df = pd.DataFrame(d) print(df)
Running Results:
Age Name Rating 0 25 Tom 4.23 1 26 James 3.24 2 25 Ricky 3.98 3 23 Vin 2.56 4 30 Steve 3.20 5 29 Smith 4.60 6 23 Jack 3.80 7 34 Lee 3.78 8 40 David 2.98 9 30 Gasper 4.80 10 51 Betina 4.10 11 46 Andres 3.65
Return the sum of the values of the requested axis. By default, the axis is the index (axis=0)
import pandas as pd import numpy as np # Create a Series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), '''Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65]) } #Create a DataFrame df = pd.DataFrame(d) print(df.sum())
Running Results:
Age 382 Name TomJamesRickyVinSteveSmithJackLeeDavidGasperBe... Rating 44.92 dtype: object
Each individual column is added with a string
This syntax will output the following content.
import pandas as pd import numpy as np #Create a series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), '''Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65]) } #Create a DataFrame df = pd.DataFrame(d) print(df.sum(1))
Running Results:
0 29.23 1 29.24 2 28.98 3 25.56 4 33.20 5 33.60 6 26.80 7 37.78 8 42.98 9 34.80 10 55.10 11 49.65 dtype: float64
Returns the average value
import pandas as pd import numpy as np #Create a series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), '''Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65]) } #Create a DataFrame df = pd.DataFrame(d) print(df.mean())
Running Results:
Age 31.833333 Rating 3.743333 dtype: float64
Returns the Bressel standard deviation of numerical columns.
import pandas as pd import numpy as np #Create a series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), '''Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65]) } #Create a DataFrame df = pd.DataFrame(d) print(df.std())
Running Results:
Age 9.232682 Rating 0.661628 dtype: float64
Now let's understand the functions under descriptive statistics in Python Pandas. The following table lists important functions:
Number | Method | Description |
1 | count() | Non-empty number |
2 | sum() | Total |
3 | mean() | Mean |
4 | median() | Median |
5 | mode() | Mode |
6 | std() | Standard deviation |
7 | min() | Minimum value |
8 | max() | Maximum value |
9 | abs() | Absolute value |
10 | prod() | Product |
11 | cumsum() | Cumulative sum |
12 | cumprod() | Cumulative product |
import pandas as pd import numpy as np #Create a series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), '''Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65]) } #Create a DataFrame df = pd.DataFrame(d) print(df.describe())
Running Results:
Age Rating count 12.000000 12.000000 mean 31.833333 3.743333 std 9.232682 0.661628 min 23.000000 2.560000 25% 25.000000 3.230000 50% 29.500000 3.790000 75% 35.500000 4.132500 max 51.000000 4.800000
This function provides the mean, std, and IQR values. And, the function does not include character columns and the given summary of numerical columns. 'include' is the parameter used to pass necessary information about which columns need to be considered when summarizing. The value list; by default, it is 'number'.
object − Summarize string columnsnumber − Summarize numerical columnsall − Summarize all columns together (it should not be treated as a list value)Below we use the following statements in the program and execute and output:
import pandas as pd import numpy as np #Create a series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), '''Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65]) } #Create a DataFrame df = pd.DataFrame(d) print(df.describe(include=['object']))
Running Results:
Name count 12 unique 12 top Ricky freq 1
Below we use the following statements in the program and execute and output:
import pandas as pd import numpy as np #Create a series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), '''Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65]) } #Create a DataFrame df = pd.DataFrame(d) print(df. describe(include='all'))
Running Results:
Age Name Rating count 12.000000 12 12.000000 unique NaN 12 NaN top NaN Ricky NaN freq NaN 1 NaN mean 31.833333 NaN 3.743333 std 9.232682 NaN 0.661628 min 23.000000 NaN 2.560000 25% 25.000000 NaN 3.230000 50% 29.500000 NaN 3.790000 75% 35.500000 NaN 4.132500 max 51.000000 NaN 4.800000