English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Points to Note in Pandas

Pandas Notes and Traps

Using If in Pandas/Truth statement

When you use boolean operators if or when, or or or not, and try to convert some content to bool, an error may occur. How the error occurs is currently unclear. Pandas raises a ValueError exception.

 import pandas as pd
 if pd.Series([False, True, False]):
    print 'I am True'

The results of the execution are as follows:

 ValueError: The truth value of a Series is ambiguous. 
 Use a.empty, a.bool(), a.item(), a.any(), or a.all().

In this case, it is not clear how to handle it. This error suggests that it is using None or any of them.

 import pandas as pd
 if pd.Series([False, True, False]).any():
    print("I am any")

The results of the execution are as follows:

I am any

To evaluate a single-element Pandas object in a boolean context, use the .bool() method-

import pandas as pd
print pd.Series([True]).bool()

The results of the execution are as follows:

True

Bitwise boolean values

Bitwise boolean operators such as == and ! will return a boolean series, which is almost always necessary.

 import pandas as pd
 s = pd.Series(range(5))
 print s==4

The results of the execution are as follows:

 0 False
 1 False
 2 False
 3 False
 4 True
 dtype: bool

isin operation

This will return a boolean series showing whether each element in the boolean value is completely contained in the passed value sequence.

 import pandas as pd
 s = pd.Series(list('abc'))
 s = s.isin(['a', 'c', 'e'])
 print s

The results of the execution are as follows:

 0 True
 1 False
 2 True
 dtype: bool

Rebuilding index vs ix index

Many users find that they use the ix index function as a concise method for selecting data from Pandas objects:

 import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three',
 'four'], index=list('abcdef'))
 print df
 print df.ix[['b', 'c', 'e']]

The results of the execution are as follows:

        one        two      three       four
a   -1.582025   1.335773   0.961417  -1.272084
b    1.461512   0.111372  -0.072225   0.553058
c   -1.240671   0.762185   1.511936  -0.630920
d   -2.380648  -0.029981   0.196489   0.531714
e    1.846746   0.148149   0.275398  -0.244559
f   -1.842662  -0.933195   2.303949   0.677641
          one        two      three       four
b    1.461512   0.111372  -0.072225   0.553058
c   -1.240671   0.762185   1.511936  -0.630920
e    1.846746   0.148149   0.275398  -0.244559

Of course, in this case, this is completely equivalent to using the reindex method:

 import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three',
 'four'], index=list('abcdef'))
 print df
 print df.reindex(['b', 'c', 'e'])

The results of the execution are as follows:

        one        two      three       four
a    1.639081   1.369838   0.261287  -1.662003
b   -0.173359   0.242447  -0.494384   0.346882
c   -0.106411   0.623568   0.282401  -0.916361
d   -1.078791  -0.612607  -0.897289  -1.146893
e    0.465215   1.552873  -1.841959   0.329404
f    0.966022  -0.190077   1.324247   0.678064
          one        two      three       four
b   -0.173359   0.242447  -0.494384   0.346882
c   -0.106411   0.623568   0.282401  -0.916361
e    0.465215   1.552873  -1.841959   0.329404

Someone might conclude that ix and reindex are based on this100% equivalent. This is the case except for integer indexing. For example, the above operation can be equivalently expressed as:

 import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three',
 'four'], index=list('abcdef'))
 print df
 print df.ix[[1, 2, 4])
 print df.reindex([1, 2, 4])

The results of the execution are as follows:

        one        two      three       four
a   -1.015695  -0.553847   1.106235  -0.784460
b   -0.527398  -0.518198  -0.710546  -0.512036
c   -0.842803  -1.050374   0.787146   0.205147
d   -1.238016  -0.749554  -0.547470  -0.029045
e   -0.056788   1.063999  -0.767220   0.212476
f    1.139714   0.036159   0.201912   0.710119
          one        two      three       four
b   -0.527398  -0.518198  -0.710546  -0.512036
c   -0.842803  -1.050374   0.787146   0.205147
e   -0.056788   1.063999  -0.767220   0.212476
    one  two  three  four
1   NaN  NaN    NaN   NaN
2   NaN  NaN    NaN   NaN
4   NaN  NaN    NaN   NaN

It is important to remember that reindexing is strictly a label-based indexing. In cases where the index contains such values as integers and strings, this may lead to some unexpected results.