hevslib.scikitlearn module

hevslib - SciKit Learn functions

hevslib.scikitlearn.addFamilyCount(df)

Add a new column (family_count) in the dataframe containing the number of rules of the same family

Parameters

df – The dataframe, must contains column `rules_list`

Returns

The dataframe with the new column “family_count”

Return type

Pandas Dataframe

Raises

None

hevslib.scikitlearn.categoryToInt(df, verbose=True)

Encodes columns of type category to integers

Parameters
  • df – pandas dataframe

  • verbose – bool give some informational output

Returns

dataframe encoded Dict: dictionary containing label encoder function objects

Return type

Pandas Dataframe

Raises

None

hevslib.scikitlearn.checkIfIsFamily(ruleset, family)

Check if a ruleset is part of a family

Parameters
  • ruleset – The set of rules to check

  • family – The family, must be a list

Returns

1 if ruleset is part of the family, 0 otherwise

Return type

Boolean

Raises

None

hevslib.scikitlearn.convertBackOneHotRules(forest_df, label_encoders, log=None)

Convert back one hot rules to category from a forest dataframe to be used directly on original data

Parameters
  • forest_df – dataframe containing the informations of the forest (returned by getForestInfo function)

  • label_encoders – the one-hot encoder (returned by categoryToInt function)

Returns

forest dataframe with converted rules

Return type

Pandas Dataframe

Raises

None

hevslib.scikitlearn.encodeOneHot(df, columns, verbose=True)

Encodes columns to one hot sklearn style

Parameters
  • df – pandas dataframe

  • columns – list of colums to encode

  • verbose – bool give some informational output

Returns

dataframe encoded Dict: dictionary containing one hot encoder function objects

Return type

Pandas Dataframe

Raises

None

hevslib.scikitlearn.encodePdOneHot(df, columns, verbose=True, log=None)

Encodes columns to one hot pandas style

Parameters
  • df – pandas dataframe

  • columns – list of colums to encode

  • verbose – bool give some informational output

Returns

dataframe encoded

Return type

Pandas Dataframe

Raises

None

hevslib.scikitlearn.extractFamilyRoot(df)

Extract and list all root families from a dataframe A family with only 1 condition is considered root

Parameters

df – The dataframe, must contains a column `rules_list`

Returns

list of root families

Return type

List

Raises

None

hevslib.scikitlearn.extractThresholdList(df, family)

Extract and list all thresholds of a family from a dataframe

Parameters
  • df – The dataframe, must contains a column `rules_list`

  • family – the family to filter threshold

Returns

list of thresholds values

Return type

List

Raises

None

hevslib.scikitlearn.formatRuleListForFilter(rule_list, verbose=False, log=None)

Format a list of rules given by a forest/tree to be compatible with the filterRows function

Parameters
  • rule_list – List of rule to format

  • verbose – bool give some informational output

Returns

  • list: of rule formated like: [[feature1, threshold1], …, [featureN, thresholdN]]

  • list: of operation (lte | gt | eq | neq)

Return type

tuple(List, List)

Raises

None

hevslib.scikitlearn.getFamily(rule)

Return the family of a rule (feature + sign)

Parameters

rule – the rule (feature + sign + threshold)

Returns

the family (feature + sign)

Return type

String

Raises

None

hevslib.scikitlearn.getFamilySampleCount(df, thickness, family)

Extract the number of sample filtered by the family from a dataframe

Parameters
  • df – The score dataframe, must contains `rules_weight_{thickness}` column

  • thickness – the thickness to use

  • family – the family from which we want the sample count

Returns

the number of sample filtered by the family

Return type

Int

Raises

None

hevslib.scikitlearn.getForestInfo(forest, df)

Get informations about a sklearn forest classifier

Parameters
  • forest – sklearn forest classifier model

  • df – dataframe that was used to train the model, it’s needed to get the features name

Returns

dataframe containing the informations of the forest

Return type

Pandas Dataframe

Raises

None

hevslib.scikitlearn.getReverseFamily(family, log=None)

Return the inverse of a family (feature + inverted sign)

Parameters

family – the family

Returns

the inverse of the family

Return type

String

Raises

None

hevslib.scikitlearn.getSubFamily(ruleset_list, family_list)

Return the list of sub family that are present in ruleset_list without the one in family_list

Parameters
  • ruleset_list – list of ruleset from where we want to export the family

  • family_list – list of family to exclude from the returned list

Returns

The sub_family list

Return type

List

Raises

None

hevslib.scikitlearn.getThreshold(rule)

Return the threshold of a rule

Parameters

rule – the rule

Returns

the threshold

Return type

Float

Raises

None

hevslib.scikitlearn.getTreeInfo(tree_index=0, tree=None, df=None)

Get informations about a sklearn decision tree

Parameters
  • tree_index – optional index if we have multiple trees

  • tree – sklearn tree model

  • df – dataframe that was used to train the tree, it’s needed to get the features name

Returns

dataframe containing the informations of the tree

Return type

Pandas Dataframe

Raises

None

hevslib.scikitlearn.groupSimilarRules(df)

Count number of occurence and similar rules inside a dataframe

  • Occurence: Count of rules that have exactly the same features, signs and thresholds

  • Similar: Count of rules that have exactly the same features and signs but with different thresholds

Parameters

df – The dataframe, must contains columns `rules_list` and `forest_id`

Returns

The dataframe with columns `rules_list`, `occurence`, `origin`, `similar_rules`

Return type

Pandas Dataframe

Raises

None

hevslib.scikitlearn.keepOnlyBiggestLeaf(forest_df, log=None)

Prune sklearn forest to keep only the branch leading to the biggest leaf in each tree

Parameters

forest_df – dataframe containing informations about the forest (given by getForestInfo)

Returns

  • dataframe containing only branch leading to the biggest leaf of each tree

  • list containing the Id of problematic trees

Return type

tuple(Pandas Dataframe, List)

Raises

None

hevslib.scikitlearn.mergeRulesOrder(list_of_ruleset)

Merge a list of unordered but similar rules to keep only the one that appear the most

Parameters

list_of_ruleset – List of similar ruleset in format: [[A < 5, B > 3], [B > 3, A < 5], [B > 3, A < 5], …]

Returns

The ruleset with the most present order ex: [B > 3, A < 5]

Return type

List

Raises

None

hevslib.scikitlearn.replaceSignToText(family)

Convert the sign of a family to the corresponding text used for html files when the family is in the name of the file

Parameters

family – The family

Returns

The family with the sign converted to text

Return type

String

Raises

None

hevslib.scikitlearn.scaleMinMax(df, center_zero=False, verbose=True)

Scales Values with the Min Max method. Uses all columns with int64 and float64 type (x_i - min(x)) / (max(x) - min(x))

Parameters
  • df – pandas dataframe

  • center_zero – center around zero

  • verbose – bool give some informational output

Returns

dataframe encoded

Return type

Pandas Dataframe

Raises

None

hevslib.scikitlearn.scaleStandard(df, verbose=True)

Scales Values with the standard method. Uses all columns with int64 and float64 type (x_i-mean(x)) / stdev(x)

Parameters
  • df – pandas dataframe

  • verbose – bool give some informational output

Returns

dataframe encoded

Return type

Pandas Dataframe

Raises

None

hevslib.scikitlearn.trainTestSplitTarget(df, target, by=None, testSize=0.2)

Splits dataset into train and test set by a feature

Parameters
  • df – pandas dataframe ml set

  • target – pandas dataframe target feature set

  • verbose – bool give some informational output

Returns

xTrain, xTest, yTrain and yTest dataframes

Return type

Pandas Dataframes

Raises

None