Pandas#

Tips for working with pandas module.

DataFrame#

Check difference between two DataFrame’s columns:

df1 = pd.DataFrame([{'a': 1, 'b': 2, 'c':3}])
df2 = pd.DataFrame([{'a': 1, 'b': 2, 'd':3}])
df1.columns.difference(df2.columns)
# Index(['c'], dtype='object')

Check whether the content of two DataFrames are equal (ignoring index and column’s names):

df1 = pd.DataFrame([{'a': 1, 'b': 2, 'c':3}])
df2 = pd.DataFrame([{'a': 1, 'b': 2, 'd':3}])
np.array_equal(df1.values, df2.values)
# True

Check whether two DataFrame are equal (we need to sort the columns otherwise the test fail):

df1 = pd.DataFrame([{'a': 1, 'b': 2, 'c':3}])
df2 = pd.DataFrame([{'a': 1, 'c': 3, 'b':2}])
pd.testing.assert_frame_equal(df1.sort_index(axis=1), df2.sort_index(axis=1))
# Pass

If you want to ignore columns dtype, set check_dtype to False.

df1 = pd.DataFrame([{'a': 1, 'b': 2, 'c':3}])
df2 = pd.DataFrame([{'a': 1, 'c': 3, 'b':2.0}])
pd.testing.assert_frame_equal(df1.sort_index(axis=1), df2.sort_index(axis=1), check_dtype=False)
# Pass

However, for the category type, the test will fail if you don’t set explicitly check_categorical to False:

df1 = pd.DataFrame([{'a': 1, 'b': 2, 'c':3}])
df2 = pd.DataFrame([{'a': 1, 'c': 3, 'b':2}])
df2['c'] = df2['c'].astype('category')
pd.testing.assert_frame_equal(df1.sort_index(axis=1), df2.sort_index(axis=1), check_dtype=False, check_categorical=False)
# Pass

Other configuration parameters for assert_frame_equal can be found on the official documentation.

Python Pandas