hevslib.pandas module

Pandas Functions

class hevslib.pandas.DataframeList(filename=None)

Bases: object

Create a new dataframeList

Parameters

filename – filepath/name of an existing .h5 file if we want to load

Returns

DataframeList

Raises

None

addDf(name=None, description=None, df=None, log=None)

Add a new dataframe to the dataframeList

Parameters
  • name – name of the DataFrame

  • description – description of the DataFrame

  • df – the DataFrame itself

Returns

None

Raises

None

exportToH5(filename, verbose=True)

Export to a .h5 file

Parameters

filename – filepath and name of where we want to export

Returns

None

Raises

None

getDfFromName(name, log=None)

Get a specific dataframe given its name

Parameters

name – name of the dataframe

Returns

the dataframe

Return type

Pandas Dataframe

Raises

None

getInfo(verbose=False)

Get/Display info about this dataframeList

Parameters

None

Returns

dataframe containing info about the dataframeList

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.absoluteNegValues(df, columns)

Absolute value for specified columns

Parameters
  • df – pandas dataframe

  • columns – list of column to process

Returns

dataframe with absolute values

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.areTwins(item1, item2, columnsToCompare=None, verbose=False, log=None)

Compare to series object and return 1 if they are twins, otherwise 0

Parameters
  • item1 – pandas series to compare

  • item2 – pandas series to compare

  • columnsToCompare – columns to use for the comparison

Returns

variable indicating if items are twins

Return type

boolean

Raises

None

hevslib.pandas.cleanDf(df, verbose=True)
Cleans pandas dataframe
  • Removes Duplicates

  • Removes Finite Columns

  • Removes NaN

Parameters
  • df – pandas dataframe

  • verbose – bool give some informational output

Returns

cleaned dataframe

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.convertSecToTimeDuration(df, columns, verbose=False, log=None)

Convert second to time duration in dataframe

Parameters
  • df – pandas dataframe

  • columns – list of existing columns that we want to process

Returns

dataframe with new converted columns

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.convertTimeDurationToSec(df, columns, verbose=True, log=None)

Convert time duration to sec in dataframe

Parameters
  • df – pandas dataframe

  • columns – list of existing columns that we want to process

Returns

dataframe with new converted columns

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.countNaN(df, verbose=False)

Count all NaN cells in a dataframe

Parameters
  • df – pandas dataframe to analyse

  • verbose – bool give some informational output

Returns

number of cells with a NaN Value

Return type

int

Raises

None

hevslib.pandas.countRowWithNaN(df, verbose=False)

Counts row with at least 1 NaN value

Parameters
  • df – pandas dataframe to analyse

  • verbose – bool give some informational output

Returns

number of rows with a NaN Value

Return type

int

Raises

None

hevslib.pandas.dfInfo(df_name, df_description, df, verbose=True)

Display info about one dataframe

Parameters
  • df_name – string shortname of the dataframe

  • df_description – string description of the dataframe

  • df – pandas dataframe

  • verbose – bool give some informational output

Returns

None

Raises

None

hevslib.pandas.dfInfoAppend(df_information, df_name, df_description, df, verbose=False)

Append dataframe informations to pandas information table

Parameters
  • df_informations – existing information table

  • df_name – string shortname of the dataframe

  • df_description – string description of the dataframe

  • dfs – pandas dataframe

  • verbose – bool give some informational output

Returns

None

Raises

None

hevslib.pandas.dfsInfo(dfs_name, dfs_description, dfs, verbose=False)

Display info about multiple dataframes

Parameters
  • dfs_name – list of strings with shortname of the dataframe

  • dfs_description – list of strings with description of the dataframe

  • dfs – list of pandas dataframe

  • verbose – bool give some informational output

Returns

None

Raises

None

hevslib.pandas.displayDiff(df, index1, index2, columns=None)

Display difference between two elements(rows) of the dataframe

Parameters
  • df – pandas dataframe

  • index1 – index of the first item we want to compare

  • index2 – index of the second item we want to compare

  • columns – list of colums to display

Returns

None

Raises

None

hevslib.pandas.displayEntryOccurences(df, columns=None, showOccurencesWhen=None)

Display occurences of selected columns

Parameters
  • df – pandas dataframe

  • columns – list of colums to display, None for all columns of df

  • showOccurencesWhen – int filer to selected on which number to display, None for remove filtering

Returns

None

Raises

None

hevslib.pandas.displayNegTimes(df, column_t1, column_t2, column_deltatime)

Display negative times values of given columns

Parameters
  • df – pandas dataframe

  • columns_t1 – datetime first time to display

  • columns_t2 – datetime second time to display

  • column_deltatime – deltatime to search for negative values

Returns

dataframe of only selected columns and negative times

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.displaySummary(df, columns=None, verbose=True)

Display a summary of the dataframe

Parameters
  • df – pandas dataframe

  • columns – list of colums to display

Returns

df_summary

Raises

None

hevslib.pandas.fillNaNToZero(df, columns, verbose=True)

Fill all NaN values with 0 for given columns

Parameters
  • df – pandas dataframe with NaN values

  • columns – list of df columns to search for NaN

  • verbose – bool give some informational output

Returns

dataframe with NaN filled by Zeros for selected columns

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.fillNegTime(df, column, verbose=True)

Fill all negative time values with zero time for given columns

Parameters
  • df – pandas dataframe with negative time values

  • columns – list of df columns to search for negative times

  • verbose – bool give some informational output

Returns

dataframe with negative times filled by zeros for selected columns

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.fillZeroToNaN(df, columns, verbose=True)

Fill all 0 values with NaN for given columns

Parameters
  • df – pandas dataframe with zero values

  • columns – list of df columns to search for zeros

  • verbose – bool give some informational output

Returns

dataframe with zeros filled by NaN for selected columns

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.filterByMonth(df, column, addMonth=0, date=None, verbose=False)

Filter a Dataframe by a month in a given column :param df: pandas dataframe :param column: column with datetime entries :param addMonth: int jump month in the past or future :param date: datetime object where year and month is used :param verbose: bool give some informational output

Returns

dataframe with filtered data

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.filterByWeek(df, column, addWeek=0, date=None, verbose=False)

Filter a Dataframe by a week in a given column :param df: pandas dataframe :param column: column with datetime entries :param addWeek: int jump week in the past or future :param date: datetime object where year and month is used :param verbose: bool give some informational output

Returns

dataframe with filtered data

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.filterRows(df, filter, type='eq', verbose=False, log=None)

Filter dataframe by filter criteria (keep values defined in filter criteria)

Parameters
  • df – pandas dataframe

  • filter – list [“<column>”,[<filtervalue_1>, <filtervalue_2>]]

  • type – string (“eq”|”neq”|”lt”|”lte”|”gt”|”gte”)

  • verbose – bool give some informational output

Returns

dataframe with filtered data

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.findTwins(df, columns=None, verbose=False)

Find twins in a dataframe (rows with same values)

Parameters
  • df – pandas dataframe

  • columns – list of dataframe columns to use to find twins

Returns

  • dataframe with new column containing the list of twins

  • list of the index of all the twins in the dataframe

Return type

tuple(Pandas Dataframe, list)

Raises

None

hevslib.pandas.fixTypes(df, dtypes, verbose=True, log=None)

Changes types of columns

Parameters
  • df – pandas input table with set of given columns

  • dtypes – dictionaries of types for table columns

  • verbose – bool give some informational output

Returns

dataframe with changed columns types

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.getSummary(df, columns=None)
Compute a summary of the dataframe

like describe() but transposed and with differents columns

Parameters
  • df – pandas dataframe

  • columns – list of colums to display

Returns

summary

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.keepColumns(df, columns_keep, text=None, verbose=True)

Only keeps all specified columns

Parameters
  • df – pandas dataframe

  • columns – list of columns to keep

  • text – string to print

  • verbose – bool give some informational output

Returns

dataframe with selected columns

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.listUniqueValues(df, columns)

Display unique values of given columns

Parameters
  • df – pandas dataframe

  • columns – list of colums to display

Returns

None

Raises

None

hevslib.pandas.removeColumns(df, columns, text=None, verbose=True, log=None)

Removes all specified columns

Parameters
  • df – pandas dataframe

  • columns – list of columns to remove

  • text – string to print

  • verbose – bool give some informational output

Returns

dataframe without the specified columns

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.removeColumnsLessThan(df, minRowNbr=1000, verbose=True)

Removes all Columns with less than x non NaN Values

Parameters
  • df – pandas dataframe

  • verbose – bool give some informational output

Returns

dataframe without these columns

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.removeDuplicates(df, verbose=True)

Removes all duplicated from a table

Parameters
  • df – pandas dataframe

  • verbose – bool give some informational output

Returns

dataframe without duplicates

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.removeFiniteColumns(df, verbose=True)

Removes all columns with only one value

Parameters
  • df – pandas dataframe

  • verbose – bool give some informational output

Returns

dataframe without the finite columns

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.removeNaN(df, column=None, verbose=True)

Removes all NaN values from a pandas dataframe

Parameters
  • df – pandas dataframe with NaN values

  • column – list of column in which the NaN will be removed, if none: remove NaN in all columns

  • verbose – give some informational output

Returns

dataframe without any NaN values

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.reorderColumns(df, columns, log=None)

Reorder columns of a dataframe

Parameters
  • df – pandas dataframe

  • columns – list of dataframe columns in the order we want them to appear in the df

Returns

dataframe with ordered columns

Return type

Pandas Dataframe

Raises

None

hevslib.pandas.saveDfCsv(df, name, outputDir)

Export a dataframe to a csv file

Parameters
  • df – dataframe to export

  • name – name of the csv file

  • outputDir – directory path where we want to export

Returns

None

Raises

None

hevslib.pandas.testNaT(df, columns, verbose=1)

Check if not a time (NaT) values in pandas dataframe exist

Parameters
  • df – pandas dataframe

  • columns – list of colums to search for

  • verbose – bool give some informational output (1|2)

Returns

variable indicating if there is NaT values

Return type

Bool

Raises

None

hevslib.pandas.testNegTime(df, columns, verbose=1)

Check if negative time values in pandas dataframe exist

Parameters
  • df – pandas dataframe

  • columns – list of colums to search for

  • verbose – int give some informational output (1|2|3)

Returns

variable indicating if there is negative time value

Return type

Bool

Raises

None

hevslib.pandas.testNull(df, columns, verbose=1, log=None)

Check if Null (NaN) values in pandas dataframe exist

Parameters
  • df – pandas dataframe

  • columns – list of colums to search for

  • verbose – bool give some informational output

Returns

variable indicating if there is null values

Return type

Bool

Raises

None