Table of contents
1.
Introduction 
2.
Types of Plot
3.
What is a Histogram?
4.
Creating a Matplotlib Histogram
5.
When to use Histogram?
6.
Essential Functions
6.1.
The Title Function
6.1.1.
Syntax 
6.1.2.
Example
6.2.
The Axis Labelling Functions
6.2.1.
Syntax
6.2.2.
Example
6.3.
The Show Function
6.3.1.
Syntax
6.3.2.
Example
6.4.
Histogram Function
6.4.1.
Syntax 
6.4.2.
Example
6.5.
The Tick Marks Function
6.5.1.
Syntax 
6.5.2.
Example
7.
Matplotlib Package
8.
Histogram In Python Using Matplotlib
8.1.
Approach 1
8.1.1.
Code 
8.1.2.
Output 
8.2.
Approach 2 
8.2.1.
Code
8.2.2.
Output
8.3.
Approach 3 
8.3.1.
Code 
8.3.2.
Output
9.
Frequently Asked Questions
9.1.
Can one plot multiple histograms on the same graph?
9.2.
Are there any alternative libraries to Matplotlib for creating a histogram?
9.3.
Can one customize the appearance of the histogram created with Matplotlib?
9.4.
Are there other plots available in Python for data visualization? 
9.5.
What is the purpose of a histogram in data analysis?
10.
Conclusion 
Last Updated: Mar 27, 2024
Easy

Python Histogram

Introduction 

Hello, Ninjas! In today’s article, we will create Python histograms using the powerful Matplotlib library. We will explore different approaches and functions to create Python histograms. Let’s elevate our data analysis game and unlock valuable insights with this must-read article.

Python Histogram

Before delving into approaches to creating a histogram, let’s start with the types of plots existing.

Types of Plot

The different types of plots which can be used to plot the data are - 

  1. Histogram
  2. Line Plot
  3. Bar Chart
  4. Scatter Point
  5. Area Plot
  6. Pie Chart
     

Today’s discussion will revolve around Histograms.

What is a Histogram?

Histogram is a type of visual data representation in which we organize the group of data according to given ranges. In the histogram, we plot bars (groups of data points) that tell about the frequency or occurrence of a particular interval in an area diagram. The height of the rectangular bars is proportional to the frequency.

Creating a Matplotlib Histogram

Creating a histogram in Python using Matplotlib is a straightforward process. Matplotlib is a popular data visualization library that provides a wide range of plotting functions, including histograms. Here's a simple example of how to create a histogram using Matplotlib:

import matplotlib.pyplot as plt
import numpy as np

# Generate random data for the histogram
data = np.random.randn(1000)

# Create a histogram
plt.hist(data, bins=30, color='skyblue', edgecolor='black')

# Add labels and title
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram Example')

# Show the plot
plt.show()

Output

output

Explanation

In this example, we import the necessary libraries: matplotlib.pyplot for plotting and numpy to generate random data. We generate random data using numpy (np.random.randn(1000) in this case). We use plt.hist() to create the histogram. The bins parameter specifies the number of bins, and color and edgecolor set the color of bars and their edges. Labels and a title are added using plt.xlabel(), plt.ylabel(), and plt.title(). Finally, we display the plot using plt.show().

When to use Histogram?

Histograms are best used in the following scenarios - 

  1. When we need to summarize the data.
     
  2. To Compare the frequency of a specific group of data with other groups.
     
  3. To know the most or least frequently occurred interval.
     
  4. When we need an idea about the probability distribution of the given data.
     

Before creating a Python histogram using Matplotlib, we must know some essential functions and their usage to generate a histogram successfully.

Essential Functions

The essential functions needed to plot any histogram are - 

The Title Function

This function is used to give a title to any histogram that we are creating. It takes a string as input and prints the title at the top-middle of the histogram.

Syntax 

The syntax for the title function is:

library_import_name.title()

Example

In the image below, The title() function sets the heading of the histogram.

Title Function

The Axis Labelling Functions

As their name suggests, the axis labeling functions are used to label the x and y-axis of the histogram. They take a string as input and print the x and y-axis labels at the middle of these axes.

Syntax

The syntax for the Axis Labelling Function is:

library_import_name.xlabel()
library_import_name.ylabel()

Example

As you can see in the image below, the x and y axes are labeled.

Axis Labelling Function

The Show Function

As its name suggests, the show() function is used to print or depict any image or figure, here, a histogram. 

Syntax

The syntax for the show function is:

library_import_name.show()

Example

As you can notice in the image below, the show() function helped to print this histogram.

Show Function

Histogram Function

This is the most crucial function used while creating any histogram, as it helps to create the histogram. It takes two arguments: the array of numbers (or data) and the array of intervals or bins in which this data’s frequency will be divided. 

Syntax 

The syntax for the Histogram creation function is:

library_import_name.hist()

Example

As we can notice in the image below, the data could be plotted as a histogram due to the hist() function only.

Histogram Function

There are various parameters taken by the .hist() function that provides flexibility in customizing the histogram plot according to the needs. Let’s look at them one by one:

Attribute

Description

list

It is an array or a list of integers that are to be plotted on a histogram.

bins

Specify the number of intervals in which the data is to be divided. It is an optional parameter.

density

It normalizes the bins heights so that the integral of the histogram comes out to be 1. It is an optional parameter of boolean type and default value as ‘false’.

range

Represents the lower and upper bound of bins and is not provided, taken as (list.min(), list.max()). Both extreme values are not considered. It is an optional parameter of type ‘tuple’ having a default value of ‘None’.

histtype

Used to create histograms of different types. Eg: ‘bar’, ‘stacked’, ‘step’, ‘stepfilled’. Its default value is ‘bar’.

align

Controls the positioning of bars relative to bin edges. It centers the bar on the left, between, or right of bin edges.

weights

Specify an array of weights for each value of data in ‘list’.

label

Used to provide a string or sequence of strings that can match multiple datasets.

stacked

Return ‘True’ if multiple data are stacked on top of each other.

color 

Sets the color or sequence or sequence of color specifications for the bars.

log

Holds a ‘true’ value if the histogram axis is set on a logarithmic scale.

rwidth

Specify the relative width of bars with respect to bin width.

orientation

Determines the orientation of the bars.

 

The Tick Marks Function

The ticker marks function is used to label the x-axis with the given interval values. In the histogram, it is used to depict the values where the particular interval ends. It takes in an array of intervals and marks the x-axis with the same.

Syntax 

The syntax for the tick marks function is:

library_import_name.xticks()

Example

The xticks() function clearly defines the ending and starting of each interval in the image below.

Tick Marks Function

Before starting with the approaches, let’s go through the important pre-requisite needed to be able to plot histograms easily.

Matplotlib Package

Matplotlib provides a library that can be used to plot graphs and create visualizations in Python. It is important to install the matplotlib package as it helps in data visualization. You can install the matplotlib package easily in MacOS, windows, or Linux by using the following command pip command - 

python -m pip install -U pip
python -m pip install -U matplotlib


Once you have installed the matplotlib package, you have to import it into the Jupyter Notebook or Python Script by using the following command -

from matplotlib import pyplot as plot


Through using this command, you can access the functionalities provided by matplotlib to create a variety of plots and visualizations. The matplotlib also provides a range of customization options through which we can enhance the look of the plot. It helps you customize the color of the bars, labels, titles, and many more.

Histogram In Python Using Matplotlib

There are different ways to plot a Python histogram using the matplotlib (plotting library). Let’s look at each one of these one by one - 

Approach 1

  1. We import the pyplot module available in the matplotlib library as plot.
     
  2.  Then we take values (height of students) as input in the form of an array, say, [100, 101, 102, 111,121, 129, 135, 136, 137, 138, 140, 141, 147, 150, 151, 152, 153, 154, 157, 158, 159, 162, 165, 166, 167, 169, 168, 171, 174, 179].
     
  3.  Now we divide these values into intervals of equal size. We can divide the values(or marks) in any number of intervals, here we divide it in 8 intervals from 100 to 180.
     
  4. So, according to these intervals, the heights of students are distributed. The table portraying the same is (based on the taken example)-
    In the histogram, the x-axis will have the count of students, and y axis will represent the intervals.
Intervals According to Data

5. To represent these values on a histogram, we use plot.hist().
 

6. We set the title of the histogram by using the plot.title() function.
 

7. We can also name the x and y axis according to our needs by using the plot.xlabel() function and plot.ylabel() functions, respectively.

Code 

# Importing matplotlib
from matplotlib import pyplot as plot

# Inserting data
height_of_students = [100, 101, 102, 111,121, 129, 135, 136, 137, 138, 140, 141, 147, 150, 151, 152, 153, 154, 157, 158, 159, 162, 165, 166, 167, 169, 168, 171, 174, 179]

# Creating the internals
intervals = [100, 110, 120, 130, 140, 150, 160, 170, 180]

# Title of histogram
plot.title(" HEIGHTS OF STUDENTS ")

# Creating histogram using hist() function
plot.hist(height_of_students, intervals)

# Labeling the x and y-axis
plot.xlabel("Height")
plot.ylabel("No. Of Students")

# Printing histogram
plot.show()

Output 

Python Histogram Using Approach1

Now, as you can see, it is very confusing as we cannot determine where the particular interval ends, as seen from the histogram. To rectify this, we use the plot.xticks() functions. That brings us to the second approach.

Approach 2 

In this approach, we use the plot.xticks() function to clearly mark the interval points on the x-axis (The rest of the code and approach remain the same ).

Code

# Importing matplotlib
from matplotlib import pyplot as plot

# Inserting data
height_of_students = [100, 101, 102, 111,121, 129, 135, 136, 137, 138, 140, 141, 147, 150, 151, 152, 153, 154, 157, 158, 159, 162, 165, 166, 167, 169, 168, 171, 174, 179]

# Creating the internals
intervals = [100, 110, 120, 130, 140, 150, 160, 170, 180]

# Title of histogram
plot.title(" HEIGHTS OF STUDENTS ")

# Creating histogram using hist() function
plot.hist(height_of_students, intervals)

# Labeling the x and y-axis
plot.xlabel("Height")
plot.ylabel("No. Of Students")

# Using the sticks () function to distinguish between the ending of each interval
plot.xticks([100, 110, 120, 130, 140, 150, 160, 170, 180])

# Printing histogram
plot.show()

Output

Python Histogram Using Approach2

Therefore, here, we can clearly see the division of marks according to provided intervals.

But it seems tedious to add the tick marks every time we plot a histogram. So to rectify this, we use the bins. This brings us to the third approach.

Approach 3 

In this approach, we use the bins to ease the task of marking the intervals on the x-axis. The bins divide the intervals into N intervals(N being provided). We can also provide it with the range of integers in the form of a list of how we want the intervals to be. The N or number of intervals can be interpreted by the no. of rows we used in Table 1. Instead of mentioning the whole array in the xticks() function, we just need to pass the bins as the argument.

Code 

# Importing matplotlib
from matplotlib import pyplot as plot

# Inserting data
height_of_students = [100, 101, 102, 111,121, 129, 135, 136, 137, 138, 140, 141, 147, 150, 151, 152, 153, 154, 157, 158, 159, 162, 165, 166, 167, 169, 168, 171, 174, 179]

# Creating the internals
intervals = [100, 110, 120, 130, 140, 150, 160, 170, 180]

# Title of the histogram
plot.title(" HEIGHTS OF STUDENTS ")

# Labeling the x and y-axis
plot.xlabel("Height")
plot.ylabel("No. Of Students")

# Providing bins with the list of range in which we want the intervals to be divided.
# 10 here represents the gap
bins = list(range(100,180,10))

# Creating histogram using hist() function
#here we provide bins the no. of piece it should divide the given interval in.
plot.hist(height_of_students, bins = 8)

# Passing bins as argument to xticks function so tha tit marks the x axis accordinly.
plot.xticks(bins)

# Printing histogram
plot.show()

Output

Python Histogram Using Approach3

Frequently Asked Questions

Can one plot multiple histograms on the same graph?

Yes, matplotlib allows you to plot multiple histograms in a single graph. 

Are there any alternative libraries to Matplotlib for creating a histogram?

Though matplotlib is widely used for data visualizations, we can also use Seaborn, Plotly, and Pandas, which have their unique features.

Can one customize the appearance of the histogram created with Matplotlib?

Yes, matplotlib offers a wide variety of customization options, including adjusting colors, number of bins, title, etc.

Are there other plots available in Python for data visualization? 

Yes, Python offers a diverse set of plots besides histograms.

What is the purpose of a histogram in data analysis?

Histograms are used to represent the frequency distribution of a given dataset visually.

Conclusion 

In this article, we explored the world of Python histograms using the Matplotlib library. We learned some essential functions needed to create a histogram and discovered various methods to create a histogram in Python. We learned how effectively we can visualize the given data and gain insights from the datasets. 

To learn more about Python and histograms, we recommend reading the following articles:

If you liked our article, do upvote our article and help other ninjas grow. You can refer to our Guided Path on Coding Ninjas Studio to upskill yourself in Data Structures and AlgorithmsCompetitive ProgrammingSystem Design, and many more!

Happy Learning!

Live masterclass