English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
The behavior of basic iteration (for) on Pandas objects depends on the type. When iterating over a Series, it is equivalent to an array. Other data structures (such as DataFrame and Panel) follow a syntax similar to dict, that is, iterating over the keys of the object.
In short, basic iteration (for i in object) produces −
Series − Value DataFrame − Column label Panel − Item label
Iterating over a DataFrame gives column names. Let's see the following example.
import pandas as pd import numpy as np N=20 df = pd.DataFrame({ 'A': pd.date_range(start='2016-01-01', periods=N, freq='D'), 'x': np.linspace(0, stop=N-1, num=N), 'y': np.random.rand(N), 'C': np.random.choice(['Low', 'Medium', 'High'], N).tolist(), 'D': np.random.normal(100, 10, size=(N)).tolist() ) for col in df: print col
The output is as follows
A C D x y
To iterate over the rows of a DataFrame, we can use the following functions-
iteritems() − Iterate over (key, value) pairs iterrows() − Iterate over rows in the form of (index, series) pairs itertuples() − Iterate over rows in the form of namedtuples
Iterate over each column as a key, and take the labeled value pairs as keys, and take the column values as Series objects.
import pandas as pd import numpy as np df = pd. DataFrame(np. random.randn(4,3), columns=[ 'col1', 'col2', 'col3']) for key, value in df. iteritems(): print key, value
Running Result:
col1 0 0.802390 1 0.324060 2 0.256811 3 0.839186 Name: col1, dtype: float64 col2 0 1.624313 1 -1.033582 2 1.796663 3 1.856277 Name: col2, dtype: float64 col3 0 -0.022142 1 -0.230820 2 1.160691 3 -0.830279 Name: col3, dtype: float64
It can be seen that each column is iterated as a key-value pair in the series.
iterrows() returns an iterator that produces each index value and a sequence containing each row of data.
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(4,3), columns = ['col1','col2','col3']) for row_index, row in df.iterrows(): print row_index, row
Running Result:
0 col1 1.529759 col2 0.762811 col3 -0.634691 Name: 0, dtype: float64 1 col1 -0.944087 col2 1.420919 col3 -0.507895 Name: 1, dtype: float64 2 col1 -0.077287 col2 -0.858556 col3 -0.663385 Name: 2, dtype: float64 3 col1 -1.638578 col2 0.059866 col3 0.493482 Name: 3, dtype: float64
Since iterrows() traverses rows, the data types in the row will not be preserved. 0,1,2is the row index, col1, col2, col3is the column index.
The itertuples() method returns an iterator that generates a named tuple for each row in the DataFrame. The first element of the tuple will be the corresponding index value of the row, and the rest will be the row values.
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(4,3), columns = ['col1','col2','col3']) for row in df.itertuples(): print row
Running Result:
Pandas(Index=0, col1=1.5297586201375899, col2=0.76281127433814944, col3=- 0.6346908238310438) Pandas(Index=1, col1=-0.94408735763808649, col2=1.4209186418359423, col3=- 0.50789517967096232) Pandas(Index=2, col1=-0.07728664756791935, col2=-0.85855574139699076, col3=- 0.6633852507207626) Pandas(Index=3, col1=0.65734942534106289, col2=-0.95057710432604969, col3=0.80344487462316527)
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(4,3), columns = ['col1','col2','col3']) for index, row in df.iterrows(): row['a'] = 10 print df
Running Result:
col1 col2 col3 0 -1.739815 0.735595 -0.295589 1 0.635485 0.106803 1.527922 2 -0.939064 0.547095 0.038585 3 -1.016509 -0.116580 -0.523158
Observe, no changes were reflected.