hevslib.scikitlearn module¶
hevslib - SciKit Learn functions
-
hevslib.scikitlearn.
addFamilyCount
(df)¶ Add a new column (family_count) in the dataframe containing the number of rules of the same family
- Parameters
df – The dataframe, must contains column
`rules_list`
- Returns
The dataframe with the new column “family_count”
- Return type
Pandas Dataframe
- Raises
None –
-
hevslib.scikitlearn.
categoryToInt
(df, verbose=True)¶ Encodes columns of type category to integers
- Parameters
df – pandas dataframe
verbose – bool give some informational output
- Returns
dataframe encoded Dict: dictionary containing label encoder function objects
- Return type
Pandas Dataframe
- Raises
None –
-
hevslib.scikitlearn.
checkIfIsFamily
(ruleset, family)¶ Check if a ruleset is part of a family
- Parameters
ruleset – The set of rules to check
family – The family, must be a list
- Returns
1 if ruleset is part of the family, 0 otherwise
- Return type
Boolean
- Raises
None –
-
hevslib.scikitlearn.
convertBackOneHotRules
(forest_df, label_encoders, log=None)¶ Convert back one hot rules to category from a forest dataframe to be used directly on original data
- Parameters
forest_df – dataframe containing the informations of the forest (returned by getForestInfo function)
label_encoders – the one-hot encoder (returned by categoryToInt function)
- Returns
forest dataframe with converted rules
- Return type
Pandas Dataframe
- Raises
None –
-
hevslib.scikitlearn.
encodeOneHot
(df, columns, verbose=True)¶ Encodes columns to one hot sklearn style
- Parameters
df – pandas dataframe
columns – list of colums to encode
verbose – bool give some informational output
- Returns
dataframe encoded Dict: dictionary containing one hot encoder function objects
- Return type
Pandas Dataframe
- Raises
None –
-
hevslib.scikitlearn.
encodePdOneHot
(df, columns, verbose=True, log=None)¶ Encodes columns to one hot pandas style
- Parameters
df – pandas dataframe
columns – list of colums to encode
verbose – bool give some informational output
- Returns
dataframe encoded
- Return type
Pandas Dataframe
- Raises
None –
-
hevslib.scikitlearn.
extractFamilyRoot
(df)¶ Extract and list all root families from a dataframe A family with only 1 condition is considered root
- Parameters
df – The dataframe, must contains a column
`rules_list`
- Returns
list of root families
- Return type
List
- Raises
None –
-
hevslib.scikitlearn.
extractThresholdList
(df, family)¶ Extract and list all thresholds of a family from a dataframe
- Parameters
df – The dataframe, must contains a column
`rules_list`
family – the family to filter threshold
- Returns
list of thresholds values
- Return type
List
- Raises
None –
-
hevslib.scikitlearn.
formatRuleListForFilter
(rule_list, verbose=False, log=None)¶ Format a list of rules given by a forest/tree to be compatible with the filterRows function
- Parameters
rule_list – List of rule to format
verbose – bool give some informational output
- Returns
list: of rule formated like: [[feature1, threshold1], …, [featureN, thresholdN]]
list: of operation (lte | gt | eq | neq)
- Return type
tuple(List, List)
- Raises
None –
-
hevslib.scikitlearn.
getFamily
(rule)¶ Return the family of a rule (feature + sign)
- Parameters
rule – the rule (feature + sign + threshold)
- Returns
the family (feature + sign)
- Return type
String
- Raises
None –
-
hevslib.scikitlearn.
getFamilySampleCount
(df, thickness, family)¶ Extract the number of sample filtered by the family from a dataframe
- Parameters
df – The score dataframe, must contains
`rules_weight_{thickness}`
columnthickness – the thickness to use
family – the family from which we want the sample count
- Returns
the number of sample filtered by the family
- Return type
Int
- Raises
None –
-
hevslib.scikitlearn.
getForestInfo
(forest, df)¶ Get informations about a sklearn forest classifier
- Parameters
forest – sklearn forest classifier model
df – dataframe that was used to train the model, it’s needed to get the features name
- Returns
dataframe containing the informations of the forest
- Return type
Pandas Dataframe
- Raises
None –
-
hevslib.scikitlearn.
getReverseFamily
(family, log=None)¶ Return the inverse of a family (feature + inverted sign)
- Parameters
family – the family
- Returns
the inverse of the family
- Return type
String
- Raises
None –
-
hevslib.scikitlearn.
getSubFamily
(ruleset_list, family_list)¶ Return the list of sub family that are present in ruleset_list without the one in family_list
- Parameters
ruleset_list – list of ruleset from where we want to export the family
family_list – list of family to exclude from the returned list
- Returns
The sub_family list
- Return type
List
- Raises
None –
-
hevslib.scikitlearn.
getThreshold
(rule)¶ Return the threshold of a rule
- Parameters
rule – the rule
- Returns
the threshold
- Return type
Float
- Raises
None –
-
hevslib.scikitlearn.
getTreeInfo
(tree_index=0, tree=None, df=None)¶ Get informations about a sklearn decision tree
- Parameters
tree_index – optional index if we have multiple trees
tree – sklearn tree model
df – dataframe that was used to train the tree, it’s needed to get the features name
- Returns
dataframe containing the informations of the tree
- Return type
Pandas Dataframe
- Raises
None –
-
hevslib.scikitlearn.
groupSimilarRules
(df)¶ Count number of occurence and similar rules inside a dataframe
Occurence: Count of rules that have exactly the same features, signs and thresholds
Similar: Count of rules that have exactly the same features and signs but with different thresholds
- Parameters
df – The dataframe, must contains columns
`rules_list`
and`forest_id`
- Returns
The dataframe with columns
`rules_list`
,`occurence`
,`origin`
,`similar_rules`
- Return type
Pandas Dataframe
- Raises
None –
-
hevslib.scikitlearn.
keepOnlyBiggestLeaf
(forest_df, log=None)¶ Prune sklearn forest to keep only the branch leading to the biggest leaf in each tree
- Parameters
forest_df – dataframe containing informations about the forest (given by getForestInfo)
- Returns
dataframe containing only branch leading to the biggest leaf of each tree
list containing the Id of problematic trees
- Return type
tuple(Pandas Dataframe, List)
- Raises
None –
-
hevslib.scikitlearn.
mergeRulesOrder
(list_of_ruleset)¶ Merge a list of unordered but similar rules to keep only the one that appear the most
- Parameters
list_of_ruleset – List of similar ruleset in format: [[A < 5, B > 3], [B > 3, A < 5], [B > 3, A < 5], …]
- Returns
The ruleset with the most present order ex: [B > 3, A < 5]
- Return type
List
- Raises
None –
-
hevslib.scikitlearn.
replaceSignToText
(family)¶ Convert the sign of a family to the corresponding text used for html files when the family is in the name of the file
- Parameters
family – The family
- Returns
The family with the sign converted to text
- Return type
String
- Raises
None –
-
hevslib.scikitlearn.
scaleMinMax
(df, center_zero=False, verbose=True)¶ Scales Values with the Min Max method. Uses all columns with int64 and float64 type (x_i - min(x)) / (max(x) - min(x))
- Parameters
df – pandas dataframe
center_zero – center around zero
verbose – bool give some informational output
- Returns
dataframe encoded
- Return type
Pandas Dataframe
- Raises
None –
-
hevslib.scikitlearn.
scaleStandard
(df, verbose=True)¶ Scales Values with the standard method. Uses all columns with int64 and float64 type (x_i-mean(x)) / stdev(x)
- Parameters
df – pandas dataframe
verbose – bool give some informational output
- Returns
dataframe encoded
- Return type
Pandas Dataframe
- Raises
None –
-
hevslib.scikitlearn.
trainTestSplitTarget
(df, target, by=None, testSize=0.2)¶ Splits dataset into train and test set by a feature
- Parameters
df – pandas dataframe ml set
target – pandas dataframe target feature set
verbose – bool give some informational output
- Returns
xTrain, xTest, yTrain and yTest dataframes
- Return type
Pandas Dataframes
- Raises
None –