Table of contents
1.
Introduction
2.
Label Encoding in Python
3.
Example of Label Encoding
4.
Example of Label Encoding (Iris Dataset)
5.
Creating the Dataset
5.1.
Example
6.
Limitation of Label Encoding
7.
Example for Limitation of Label Encoding
8.
Nominal Scale
9.
Ordinal Scale
10.
Label Encoding Using the Scikit-learn Library
10.1.
Example
11.
Label Encoding Using Category Codes
11.1.
Example
12.
Frequently Asked Questions
12.1.
What is label encoding in Python?
12.2.
What are the limitations of label encoding?
12.3.
How can I perform label encoding in Python?
13.
Conclusion
Last Updated: Aug 25, 2025
Easy

Label Encoding in Python

Author Sinki Kumari
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Label encoding is a popular technique in data preprocessing, particularly in machine learning. It converts categorical data (like names or categories) into numerical format, making it easier for algorithms to process and analyze. 

Label Encoding in Python

This article will introduce you to label encoding, its functionality, examples using Python, and potential limitations. 

Label Encoding in Python

Label encoding assigns a unique numerical value to each distinct category in a dataset. For example, categories like "red," "green," and "blue" can be converted to numbers such as 0, 1, and 2, respectively. This transformation helps machine learning models process the data effectively. Label encoding can be applied using libraries like scikit-learn or through manual methods.  

Scikit-learn provides a LabelEncoder class specifically designed for this purpose. To use label encoding, you first need to import the LabelEncoder class from scikit-learn.preprocessing. Then, create an instance of the LabelEncoder & call its fit_transform() method on your categorical data. This method fits the encoder to the data & transforms the categories into corresponding integer values.

Example of Label Encoding

Here is a basic example of label encoding in Python:

from sklearn.preprocessing import LabelEncoder

# Sample data
categories = ['dog', 'cat', 'rabbit', 'cat', 'dog', 'rabbit']

# Initialize LabelEncoder
label_encoder = LabelEncoder()

# Fit and transform the data
encoded_labels = label_encoder.fit_transform(categories)

print("Original Categories:", categories)
print("Encoded Labels:", encoded_labels)
You can also try this code with Online Python Compiler
Run Code


Output:

Original Categories: ['dog', 'cat', 'rabbit', 'cat', 'dog', 'rabbit']
Encoded Labels: [1 0 2 0 1 2]


Explanation:

  • The LabelEncoder assigns unique numerical values to each category:
    • "cat" = 0
       
    • "dog" = 1
       
    • "rabbit" = 2
       
  • This encoding allows algorithms to process categorical data more efficiently.

Example of Label Encoding (Iris Dataset)

Let’s see how label encoding is applied to a real dataset, the Iris dataset.

from sklearn.datasets import load_iris
from sklearn.preprocessing import LabelEncoder
import pandas as pd

# Load Iris dataset
data = load_iris()
iris_df = pd.DataFrame(data.data, columns=data.feature_names)
iris_df['species'] = data.target_names[data.target]

# Initialize LabelEncoder
label_encoder = LabelEncoder()
iris_df['encoded_species'] = label_encoder.fit_transform(iris_df['species'])

print(iris_df[['species', 'encoded_species']].head())
You can also try this code with Online Python Compiler
Run Code


Output:

  species  encoded_species
0  setosa                0
1  setosa                0
2  setosa                0
3  setosa                0
4  setosa                0


Explanation:

  • The species column is encoded into numerical values.
     
  • Machine learning algorithms can now process this encoded column.

Creating the Dataset

To demonstrate label encoding in action, let's create a sample dataset. We'll use the popular Pandas library to create a DataFrame containing categorical data. 

Example

import pandas as pd
# Create a sample dataset
data = {
    'color': ['red', 'green', 'blue', 'green', 'red', 'blue'],
    'size': ['small', 'medium', 'large', 'medium', 'small', 'large'],
    'material': ['wood', 'metal', 'plastic', 'wood', 'metal', 'plastic']
}


# Create a DataFrame
df = pd.DataFrame(data)


print(df)

 

Output:

   color     size   material
0    red    small      wood
1  green   medium     metal
2   blue    large   plastic
3  green   medium      wood
4    red    small     metal
5   blue    large   plastic

 

In this example, we create a dictionary called `data` that contains three categorical variables: color, size, & material. We then use the pd.DataFrame() function to create a DataFrame `df` from the `data` dictionary. 

The resulting DataFrame has three columns representing the categorical variables, & each row represents an observation or data point. With our dataset ready, we can now apply label encoding to transform the categorical variables into numerical form.

Limitation of Label Encoding

Although label encoding is simple and effective, it has limitations, especially when dealing with ordinal relationships or nominal scales.

  1. Ordinal Misinterpretation: Algorithms may misinterpret encoded values as having a meaningful order, which can lead to biased results.
     
  2. Inconsistent Results: Using different encoding orders across datasets may yield inconsistent outcomes.

Example for Limitation of Label Encoding

Example

from sklearn.preprocessing import LabelEncoder

# Categories with no inherent order
categories = ['red', 'green', 'blue']

# Initialize LabelEncoder
label_encoder = LabelEncoder()
encoded_labels = label_encoder.fit_transform(categories)
print("Original Categories:", categories)
print("Encoded Labels:", encoded_labels)
You can also try this code with Online Python Compiler
Run Code


Output

Original Categories: ['red', 'green', 'blue']
Encoded Labels: [2 1 0]

 

Here, "blue" is encoded as 0, "green" as 1, and "red" as 2. However, there is no inherent order among the colors, so the model might wrongly infer relationships between these numbers.

Nominal Scale

Nominal data represents categories without any order or ranking. For example:

  • Categories: "Apple," "Banana," "Cherry"
     
  • Encoded Labels: [0, 1, 2]

In such cases, label encoding is often unsuitable as it can imply a non-existent order among the categories.

Ordinal Scale

Ordinal data represents categories with a meaningful order. For example:

  • Categories: "Low," "Medium," "High"
     
  • Encoded Labels: [0, 1, 2]

In this case, label encoding works well since the order aligns with the numerical values.

Label Encoding Using the Scikit-learn Library

The scikit-learn library provides a straightforward way to perform label encoding. It ensures consistency and ease of implementation.

Example

from sklearn.preprocessing import LabelEncoder

# Sample data
animals = ['dog', 'cat', 'rabbit']

# Initialize LabelEncoder
label_encoder = LabelEncoder()

# Fit and transform the data
encoded_animals = label_encoder.fit_transform(animals)

print("Encoded Values:", encoded_animals)
print("Mapping:", dict(zip(label_encoder.classes_, range(len(label_encoder.classes_)))))
You can also try this code with Online Python Compiler
Run Code

 

Output:

Encoded Values: [2 0 1]
Mapping: {'cat': 0, 'dog': 2, 'rabbit': 1}


Explanation:

The LabelEncoder method is efficient for converting categories to integers. The mapping helps understand how the categories were encoded.

Label Encoding Using Category Codes

For pandas users, the category data type provides a simple method to encode labels using cat.codes.

Example

import pandas as pd

# Sample data
data = {'fruits': ['apple', 'banana', 'cherry', 'apple', 'banana']}
fruits_df = pd.DataFrame(data)

# Convert to categorical and use category codes
fruits_df['encoded'] = fruits_df['fruits'].astype('category').cat.codes
print(fruits_df)
You can also try this code with Online Python Compiler
Run Code

 

Output:

fruits  encoded
0  apple        0
1  banana       1
2  cherry       2
3  apple        0
4  banana       1


Explanation:

  • The astype('category') method converts the column to categorical type.
     
  • The cat.codes method provides integer encoding for the categories.

Frequently Asked Questions

What is label encoding in Python?

Label encoding converts categorical data into numerical values so that machine learning algorithms can process it. It assigns a unique integer to each category.

What are the limitations of label encoding?

Label encoding can misrepresent relationships in nominal data and create misleading patterns when categories lack a meaningful order.

How can I perform label encoding in Python?

You can use libraries like scikit-learn’s LabelEncoder or pandas’ category type with cat.codes.

Conclusion

Label encoding is a fundamental technique in data preprocessing for converting categorical data into numerical format. While it is effective, it is essential to understand its limitations, particularly when working with nominal data. Using tools like scikit-learn or pandas, you can implement label encoding easily and consistently. 

Recommended Readings:

Live masterclass