Pandas
Tips for working with pandas module.
DataFrame
Check difference between two DataFrame's columns:
df1 = pd.DataFrame([{'a': 1, 'b': 2, 'c':3}])
df2 = pd.DataFrame([{'a': 1, 'b': 2, 'd':3}])
df1.columns.difference(df2.columns)
# Index(['c'], dtype='object')
Check whether the content of two DataFrames are equal (ignoring index and column's names):
df1 = pd.DataFrame([{'a': 1, 'b': 2, 'c':3}])
df2 = pd.DataFrame([{'a': 1, 'b': 2, 'd':3}])
np.array_equal(df1.values, df2.values)
# True
Check whether two DataFrame are equal (we need to sort the columns otherwise the test fail):
df1 = pd.DataFrame([{'a': 1, 'b': 2, 'c':3}])
df2 = pd.DataFrame([{'a': 1, 'c': 3, 'b':2}])
pd.testing.assert_frame_equal(df1.sort_index(axis=1), df2.sort_index(axis=1))
# Pass
If you want to ignore columns dtype, set check_dtype to False.
df1 = pd.DataFrame([{'a': 1, 'b': 2, 'c':3}])
df2 = pd.DataFrame([{'a': 1, 'c': 3, 'b':2.0}])
pd.testing.assert_frame_equal(df1.sort_index(axis=1), df2.sort_index(axis=1), check_dtype=False)
# Pass
However, for the category type, the test will fail if you don't set explicitly check_categorical to False:
df1 = pd.DataFrame([{'a': 1, 'b': 2, 'c':3}])
df2 = pd.DataFrame([{'a': 1, 'c': 3, 'b':2}])
df2['c'] = df2['c'].astype('category')
pd.testing.assert_frame_equal(df1.sort_index(axis=1), df2.sort_index(axis=1), check_dtype=False, check_categorical=False)
# Pass
Other configuration parameters for assert_frame_equal can be found on the official
documentation.