English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

SQL Operations in Pandas Sparse Data in Pandas

Points to Note in Pandas

Pandas Notes and Traps

Using If in Pandas/Truth statement

When you use boolean operators if or when, or or or not, and try to convert some content to bool, an error may occur. How the error occurs is currently unclear. Pandas raises a ValueError exception.

Example

　import pandas as pd
　if pd.Series([False, True, False]):
　　　　print 'I am True'

The results of the execution are as follows:

　ValueError: The truth value of a Series is ambiguous.　
　Use a.empty, a.bool(), a.item(), a.any(), or a.all().

In this case, it is not clear how to handle it. This error suggests that it is using None or any of them.

Example

　import pandas as pd
　if pd.Series([False, True, False]).any():
　　　　print("I am any")

The results of the execution are as follows:

I am any

To evaluate a single-element Pandas object in a boolean context, use the .bool() method-

Example

import pandas as pd
print pd.Series([True]).bool()

The results of the execution are as follows:

True

Bitwise boolean values

Bitwise boolean operators such as == and ! will return a boolean series, which is almost always necessary.

Example

　import pandas as pd
　s = pd.Series(range(5))
　print s==4

The results of the execution are as follows:

　0 False
　1　False
　2　False
　3　False
　4　True
　dtype: bool

isin operation

This will return a boolean series showing whether each element in the boolean value is completely contained in the passed value sequence.

Example

　import pandas as pd
　s = pd.Series(list('abc'))
　s = s.isin(['a', 'c', 'e'])
　print s

The results of the execution are as follows:

　0 True
　1　False
　2　True
　dtype: bool

Rebuilding index vs ix index

Many users find that they use the ix index function as a concise method for selecting data from Pandas objects:

Example

　import pandas as pd
　import numpy as np
　df = pd.DataFrame(np.random.randn(6,　4), columns=['one', 'two', 'three',
　'four'], index=list('abcdef'))
　print df
　print df.ix[['b', 'c', 'e']]

The results of the execution are as follows:

　　　　　　　　one　　　　　　　　two　　　　　　three　　　　　　　four
a　　　-1.582025　　　1.335773　　　0.961417　　-1.272084
b　　　　1.461512　　　0.111372　　-0.072225　　　0.553058
c　　　-1.240671　　　0.762185　　　1.511936　　-0.630920
d　　　-2.380648　　-0.029981　　　0.196489　　　0.531714
e　　　　1.846746　　　0.148149　　　0.275398　　-0.244559
f　　　-1.842662　　-0.933195　　　2.303949　　　0.677641
　　　　　　　　　　one　　　　　　　　two　　　　　　three　　　　　　　four
b　　　　1.461512　　　0.111372　　-0.072225　　　0.553058
c　　　-1.240671　　　0.762185　　　1.511936　　-0.630920
e　　　　1.846746　　　0.148149　　　0.275398　　-0.244559

Of course, in this case, this is completely equivalent to using the reindex method:

Example

　import pandas as pd
　import numpy as np
　df = pd.DataFrame(np.random.randn(6,　4), columns=['one', 'two', 'three',
　'four'], index=list('abcdef'))
　print df
　print df.reindex(['b', 'c', 'e'])

The results of the execution are as follows:

　　　　　　　　one　　　　　　　　two　　　　　　three　　　　　　　four
a　　　　1.639081　　　1.369838　　　0.261287　　-1.662003
b　　　-0.173359　　　0.242447　　-0.494384　　　0.346882
c　　　-0.106411　　　0.623568　　　0.282401　　-0.916361
d　　　-1.078791　　-0.612607　　-0.897289　　-1.146893
e　　　　0.465215　　　1.552873　　-1.841959　　　0.329404
f　　　　0.966022　　-0.190077　　　1.324247　　　0.678064
　　　　　　　　　　one　　　　　　　　two　　　　　　three　　　　　　　four
b　　　-0.173359　　　0.242447　　-0.494384　　　0.346882
c　　　-0.106411　　　0.623568　　　0.282401　　-0.916361
e　　　　0.465215　　　1.552873　　-1.841959　　　0.329404

Someone might conclude that ix and reindex are based on this100% equivalent. This is the case except for integer indexing. For example, the above operation can be equivalently expressed as:

Example

　import pandas as pd
　import numpy as np
　df = pd.DataFrame(np.random.randn(6,　4), columns=['one', 'two', 'three',
　'four'], index=list('abcdef'))
　print df
　print df.ix[[1,　2,　4])
　print df.reindex([1,　2,　4])

The results of the execution are as follows:

　　　　　　　　one　　　　　　　　two　　　　　　three　　　　　　　four
a　　　-1.015695　　-0.553847　　　1.106235　　-0.784460
b　　　-0.527398　　-0.518198　　-0.710546　　-0.512036
c　　　-0.842803　　-1.050374　　　0.787146　　　0.205147
d　　　-1.238016　　-0.749554　　-0.547470　　-0.029045
e　　　-0.056788　　　1.063999　　-0.767220　　　0.212476
f　　　　1.139714　　　0.036159　　　0.201912　　　0.710119
　　　　　　　　　　one　　　　　　　　two　　　　　　three　　　　　　　four
b　　　-0.527398　　-0.518198　　-0.710546　　-0.512036
c　　　-0.842803　　-1.050374　　　0.787146　　　0.205147
e　　　-0.056788　　　1.063999　　-0.767220　　　0.212476
　　　　one　　two　　three　　four
1　　　NaN　　NaN　　　　NaN　　　NaN
2　　　NaN　　NaN　　　　NaN　　　NaN
4　　　NaN　　NaN　　　　NaN　　　NaN

It is important to remember that reindexing is strictly a label-based indexing. In cases where the index contains such values as integers and strings, this may lead to some unexpected results.

SQL Operations in Pandas Sparse Data in Pandas