hevslib.scikitlearn module¶

hevslib - SciKit Learn functions

hevslib.scikitlearn.addFamilyCount(df)¶

Add a new column (family_count) in the dataframe containing the number of rules of the same family

Parameters: df – The dataframe, must contains column `rules_list`
Returns: The dataframe with the new column “family_count”
Return type: Pandas Dataframe
Raises: None –

hevslib.scikitlearn.categoryToInt(df, verbose=True)¶

Encodes columns of type category to integers

Parameters

df – pandas dataframe
verbose – bool give some informational output

Returns

dataframe encoded Dict: dictionary containing label encoder function objects

Return type

Pandas Dataframe

Raises

None –

hevslib.scikitlearn.checkIfIsFamily(ruleset, family)¶

Check if a ruleset is part of a family

Parameters

ruleset – The set of rules to check
family – The family, must be a list

Returns

1 if ruleset is part of the family, 0 otherwise

Return type

Boolean

Raises

None –

hevslib.scikitlearn.convertBackOneHotRules(forest_df, label_encoders, log=None)¶

Convert back one hot rules to category from a forest dataframe to be used directly on original data

Parameters

forest_df – dataframe containing the informations of the forest (returned by getForestInfo function)
label_encoders – the one-hot encoder (returned by categoryToInt function)

Returns

forest dataframe with converted rules

Return type

Pandas Dataframe

Raises

None –

hevslib.scikitlearn.encodeOneHot(df, columns, verbose=True)¶

Encodes columns to one hot sklearn style

Parameters

df – pandas dataframe
columns – list of colums to encode
verbose – bool give some informational output

Returns

dataframe encoded Dict: dictionary containing one hot encoder function objects

Return type

Pandas Dataframe

Raises

None –

hevslib.scikitlearn.encodePdOneHot(df, columns, verbose=True, log=None)¶

Encodes columns to one hot pandas style

Parameters

df – pandas dataframe
columns – list of colums to encode
verbose – bool give some informational output

Returns

dataframe encoded

Return type

Pandas Dataframe

Raises

None –

hevslib.scikitlearn.extractFamilyRoot(df)¶

Extract and list all root families from a dataframe A family with only 1 condition is considered root

Parameters: df – The dataframe, must contains a column `rules_list`
Returns: list of root families
Return type: List
Raises: None –

hevslib.scikitlearn.extractThresholdList(df, family)¶

Extract and list all thresholds of a family from a dataframe

Parameters

df – The dataframe, must contains a column `rules_list`
family – the family to filter threshold

Returns

list of thresholds values

Return type

List

Raises

None –

hevslib.scikitlearn.formatRuleListForFilter(rule_list, verbose=False, log=None)¶

Format a list of rules given by a forest/tree to be compatible with the filterRows function

Parameters

rule_list – List of rule to format
verbose – bool give some informational output

Returns

list: of rule formated like: [[feature1, threshold1], …, [featureN, thresholdN]]
list: of operation (lte | gt | eq | neq)

Return type

tuple(List, List)

Raises

None –

hevslib.scikitlearn.getFamily(rule)¶

Return the family of a rule (feature + sign)

Parameters: rule – the rule (feature + sign + threshold)
Returns: the family (feature + sign)
Return type: String
Raises: None –

hevslib.scikitlearn.getFamilySampleCount(df, thickness, family)¶

Extract the number of sample filtered by the family from a dataframe

Parameters

df – The score dataframe, must contains `rules_weight_{thickness}` column
thickness – the thickness to use
family – the family from which we want the sample count

Returns

the number of sample filtered by the family

Return type

Int

Raises

None –

hevslib.scikitlearn.getForestInfo(forest, df)¶

Get informations about a sklearn forest classifier

Parameters

forest – sklearn forest classifier model
df – dataframe that was used to train the model, it’s needed to get the features name

Returns

dataframe containing the informations of the forest

Return type

Pandas Dataframe

Raises

None –

hevslib.scikitlearn.getReverseFamily(family, log=None)¶

Return the inverse of a family (feature + inverted sign)

Parameters: family – the family
Returns: the inverse of the family
Return type: String
Raises: None –

hevslib.scikitlearn.getSubFamily(ruleset_list, family_list)¶

Return the list of sub family that are present in ruleset_list without the one in family_list

Parameters

ruleset_list – list of ruleset from where we want to export the family
family_list – list of family to exclude from the returned list

Returns

The sub_family list

Return type

List

Raises

None –

hevslib.scikitlearn.getThreshold(rule)¶

Return the threshold of a rule

Parameters: rule – the rule
Returns: the threshold
Return type: Float
Raises: None –

hevslib.scikitlearn.getTreeInfo(tree_index=0, tree=None, df=None)¶

Get informations about a sklearn decision tree

Parameters

tree_index – optional index if we have multiple trees
tree – sklearn tree model
df – dataframe that was used to train the tree, it’s needed to get the features name

Returns

dataframe containing the informations of the tree

Return type

Pandas Dataframe

Raises

None –

hevslib.scikitlearn.groupSimilarRules(df)¶

Count number of occurence and similar rules inside a dataframe

Occurence: Count of rules that have exactly the same features, signs and thresholds

Similar: Count of rules that have exactly the same features and signs but with different thresholds

Parameters: df – The dataframe, must contains columns `rules_list` and `forest_id`
Returns: The dataframe with columns `rules_list`, `occurence`, `origin`, `similar_rules`
Return type: Pandas Dataframe
Raises: None –

hevslib.scikitlearn.keepOnlyBiggestLeaf(forest_df, log=None)¶

Prune sklearn forest to keep only the branch leading to the biggest leaf in each tree

Parameters

forest_df – dataframe containing informations about the forest (given by getForestInfo)

Returns

dataframe containing only branch leading to the biggest leaf of each tree
list containing the Id of problematic trees

Return type

tuple(Pandas Dataframe, List)

Raises

None –

hevslib.scikitlearn.mergeRulesOrder(list_of_ruleset)¶

Merge a list of unordered but similar rules to keep only the one that appear the most

Parameters: list_of_ruleset – List of similar ruleset in format: [[A < 5, B > 3], [B > 3, A < 5], [B > 3, A < 5], …]
Returns: The ruleset with the most present order ex: [B > 3, A < 5]
Return type: List
Raises: None –

hevslib.scikitlearn.replaceSignToText(family)¶

Convert the sign of a family to the corresponding text used for html files when the family is in the name of the file

Parameters: family – The family
Returns: The family with the sign converted to text
Return type: String
Raises: None –

hevslib.scikitlearn.scaleMinMax(df, center_zero=False, verbose=True)¶

Scales Values with the Min Max method. Uses all columns with int64 and float64 type (x_i - min(x)) / (max(x) - min(x))

Parameters

df – pandas dataframe
center_zero – center around zero
verbose – bool give some informational output

Returns

dataframe encoded

Return type

Pandas Dataframe

Raises

None –

hevslib.scikitlearn.scaleStandard(df, verbose=True)¶

Scales Values with the standard method. Uses all columns with int64 and float64 type (x_i-mean(x)) / stdev(x)

Parameters

df – pandas dataframe
verbose – bool give some informational output

Returns

dataframe encoded

Return type

Pandas Dataframe

Raises

None –

hevslib.scikitlearn.trainTestSplitTarget(df, target, by=None, testSize=0.2)¶

Splits dataset into train and test set by a feature

Parameters

df – pandas dataframe ml set
target – pandas dataframe target feature set
verbose – bool give some informational output

Returns

xTrain, xTest, yTrain and yTest dataframes

Return type

Pandas Dataframes

Raises

None –