Ordinal Encoder

In the realm of data science and machine learning, handling categorical data effectively is crucial for building robust predictive models. Categorical data, often representing discrete labels or categories, needs to be converted into numerical formats before feeding into machine learning algorithms

Scikit-learn, a powerful and widely-used Python library for machine learning, offers a convenient tool for ordinal encoding through its OrdinalEncoder class. The OrdinalEncoder efficiently transforms categorical features into an integer array where the integers correspond to the ordered categories. This method preserves the ordinal nature of the data, ensuring that the encoded values reflect the inherent order in the categories

Ordinal Encoding with Pandas and scikit-learn

Start by importing in pandas and OrdinalEncoder

import pandas as pd 
from sklearn.preprocessing import OrdinalEncoder

This dictionary above will be what we use for making the basis of a pandas dataframe.

d = {'sales': [100000,222000,1000000,522000,111111,222222,1111111,20000,75000,90000,1000000,10000], 'city': ['Tampa','Tampa','Orlando','Jacksonville','Miami','Jacksonville','Miami','Miami','Orlando','Orlando','Orlando','Orlando'], 'size': ['Small', 'Medium','Large','Large','Small','Medium','Large','Small','Medium','Medium','Medium','Small',]}

To convert the dictionary to a dataframe, use ps.DataFrame and pass in the data as a paramater. Once this is completed, let’s take a look at the first 5 rows using head.


df = pd.DataFrame(data=d)
df.head()

The next step is to find out the unique values for the colum we are about to encode. To do that run: df[‘size’].unique()

Now we create a list called sizes. We use all of the unique values. This is needed for when we create out Ordinal Encoder.

sizes = ['Small', 'Medium', 'Large']

Now we create our Ordinal Encoder. As a parameter we pass in the sizes. After it’s created we need to fit and transform the column.

To see what this will look like we can print it out.

enc = OrdinalEncoder(categories = [sizes])

Print(enc.fit_transform(df[['size']])) 

We have to assign the fit_transform back to the dataframe size column. Once we do that use head to see what the final dataframe looks like.

df['size'] = enc.fit_transform(df[['size']])

Leave a Reply

Your email address will not be published. Required fields are marked *