Table of contents
1.
Introduction
2.
Features
3.
Installation
4.
 
5.
Implementation
5.1.
Importing essential libraries
5.1.1.
 
5.2.
Loading and filtering the dataset
5.2.1.
 
5.3.
Creating Visualizations
5.3.1.
Plotting curve and histogram
5.3.2.
 
5.3.3.
Plotting bar graph
5.3.4.
 
5.3.5.
 
5.3.6.
Tabular data and its conversions
5.3.7.
 
5.3.8.
 
5.3.9.
 
5.3.10.
 
5.3.11.
 
5.3.12.
 
5.3.13.
Heatmap
5.3.14.
 
5.3.15.
Pie chart using Bokeh
5.3.16.
 
6.
FAQs
7.
Key Takeaways
Last Updated: Mar 27, 2024

Modern data visualization with HoloViews

Author soham Medewar
0 upvote

Introduction

Data analysis and visualization are the essential parts of Machine learning. Data visualization helps us to find out patterns, anomalies, and trends in the dataset; data visualization is the easiest way to understand what the dataset is trying to convey, as we can see through our naked eye and process all the patterns of the dataset in our brain. 

Data is visualized in the forms of pie charts, bar graphs, histograms, scatter plots, etc., where every plot has its specific way of displaying the data.

Holoviews is an open-source package in python, where appealing visualizations can be made with minimal code and effort. HoloViews support 'matplotlib' and 'bokeh'.

Features

  • Allow you to create data structures that both store and display your data.
  • MatplotlibBokeh, and plotly are used in the backend for plotting outputs.
  • It supports all recent releases of IPython and Jupyter Notebooks.
  • Indexing and slicing of data in arbitrarily high-dimensional spaces with rich semantics.

Installation

Holoviews is a python package. You can use the terminal to install it. I will be installing the HoloViews package using anaconda.

Type the below code to install the package.

!pip install holoviews

 

Alternate method (for conda users)

conda install -c pyviz holoviews bokeh

 

Implementation

Let's take a sample dataset and analyze the dataset using various visualization techniques.

Here’s the link for the dataset

Importing essential libraries

To visualize the data, we need to import some necessary libraries, i.e., pandas and seaborn. We will use 'matplotlib' and 'bokeh' as extensions for visualization purposes.

import pandas as pd
import seaborn as sns
import holoviews as hv  
from holoviews import opts

import math  
#extensions used for visualization
hv.extension('bokeh''matplotlib'

 

The following output indicates that all the libraries have been successfully loaded.


Loading and filtering the dataset

We will be loading our dataset using the pandas library. The dataset is about google play store apps. Each column will have app_name, category, rating, review, size, installs, etc.

Loading the dataset and printing its size.

data = pd.read_csv("googleplaystore.csv")
data.shape
(1084113)

Delete all the rows having “NaN” values. 

data = data.dropna()

data.shape

 

(936013)

We can see the shape of data before and after filtering. So, the first step is to filter the data (always not necessary to drop all the "nan" values, you can also replace them).

Displaying the first five entries of the dataset.

data.head(5)

Creating Visualizations

Plotting curve and histogram

We can see the rating of apps in the dataset; it lies between 1 to 5. We will be plotting a frequency distribution of ratings of all the apps.

# loading rating of all the apps in y
y = np.array(data['Rating'])

# plotting histogram
frequencies, edges = np.histogram(y, 50)
histogram = hv.Histogram((edges, frequencies))

# plotting curve
xs = [i*0.1 forin range(51)]
ys = []
forin range(51):
    ys.append(0)
forin range(len(y)):
    ys[int(y[i]*10)]+=1
curve = hv.Curve((xs,ys), 'Rating''Frequency')

# plot both graphs together
curve + histogram

 

In the x-axis, we can see the rating of apps between 1 to 5, and in the y-axis frequency of apps. Each bar in the graph denotes the total number of apps having frequency x.

Plotting bar graph

In this part, I will plot the bar graph for the 'Installs' column from the dataset.

# loading 'Installs' column
y = np.array(data['Installs'])

# function to frequency of each type
def freq(y):
    mp = {}
    forin y:
        if (i in mp):
            mp[i] += 1
        else:
            mp[i] = 1
    return mp

#assiging X and Y, keys and values of a map
X, Y = list(freq(y).keys()), list(freq(y).values())

# creating bar graph 
info = list(zip(X, Y))
bars = hv.Bars(info, hv.Dimension('Installs'), 'Frequency').opts(fontscale=0.7, width=1000, height=400, title='Frequency of Installs')
bars

 

 

The graph's x-axis represents the total number of installs, and the graph's y-axis represents frequency. A bar in the graph represents the total number of apps having x installs.

We can also invert the above graph by adding some simple code.

bars.relabel('Invert axes').opts(fontscale=1,invert_axes=True, width=700, height=500)

 

We have inverted the bar graph. (Installs can be seen clearly as compared to the previous graph)

Tabular data and its conversions

In this part, we will construct tabular data of the 'Category' column, where the frequency of each category will be stored in the table—further using that tabular data to construct various graphs.

Making tabular data

# loading cateory column
a = data["Category"]

# function to frequency of each type
def freq(y):
    mp = {}
    forin y:
        if (i in mp):
            mp[i] += 1
        else:
            mp[i] = 1
    return mp

# assiging X and Y, keys and values of map
x1, y1 = list(freq(a).keys()), list(freq(a).values())

# creating table
table = hv.Table((x1, y1), 'Category''Frequency').opts(fontscale=0.2, height=1000)
table

 

 

Now we will use this table to plot the Scatter graph.

hv.Scatter(table).opts(fontscale=1,invert_axes=True, width=700, height=500)

 

Scatter plot for the above tabular data in inverted form.

Using the tabular form to plot the Curve graph.

hv.Curve(table).opts(fontscale=1,invert_axes=True, width=700, height=500)

 

Curve plot for the above tabular data in inverted form.

Using the tabular form to plot the Area graph.

hv.Area(table).opts(fontscale=1,invert_axes=True, width=700, height=500)

 

Area plot for the above tabular data in inverted form.

Using the tabular form to plot the Bar graph.

hv.Bars(table)).opts(fontscale=1,invert_axes=True, width=700, height=500)

 

 

Bar plot for the above tabular data in inverted form.

We can also represent two plots under one graph using the ‘*’ operator. 

hv.Bars(table).opts(fontscale=1,invert_axes=True, width=700, height=500)*hv.Curve(table).opts(fontscale=1,invert_axes=True, width=700, height=500).options({'Curve': {'color': hv.Cycle('Set1'), 'width'600}})

 

Heatmap

category_types = data['Category'].unique()
content_rating_types = data['Content Rating'].unique()

print(category_types, content_rating_types)

 

array(['ART_AND_DESIGN''AUTO_AND_VEHICLES''BEAUTY',
      'BOOKS_AND_REFERENCE''BUSINESS''COMICS''COMMUNICATION',
      'DATING''EDUCATION''ENTERTAINMENT''EVENTS''FINANCE',
      'FOOD_AND_DRINK''HEALTH_AND_FITNESS''HOUSE_AND_HOME',
      'LIBRARIES_AND_DEMO''LIFESTYLE''GAME''FAMILY''MEDICAL',
'SOCIAL''SHOPPING''PHOTOGRAPHY''SPORTS''TRAVEL_AND_LOCAL''TOOLS''PERSONALIZATION''PRODUCTIVITY''PARENTING''WEATHER''VIDEO_PLAYERS''NEWS_AND_MAGAZINES''MAPS_AND_NAVIGATION'],
      dtype=object)
array(['Everyone''Teen''Everyone 10+''Mature 17+',
      'Adults only 18+''Unrated'], dtype=object)

We will create a heatmap for the above two columns. 

# initialising dictionary
dict_data = {}
forin data['Category'].unique():
    forin data['Content Rating'].unique():
        dict_data[(j, i)]=0;

# loading category and content rating column
category_ = np.array(data['Category'])
content_rating_ = np.array(data['Content Rating'])

forin range(len(category_)):
    dict_data[(content_rating_[i], category_[i],)]+=1
  
heat_map_data = []
forin dict_data:
    heat_map_data.append((i[0], i[1], dict_data[i]))

# plotting heatmap
hm = hv.HeatMap(heat_map_data)
hm.opts(height = 600, width = 800, colorbar=True)

 

The above heatmap gives frequency of apps having both category_types[i] and content_rating_types[j] feature (0 <= i < len(category_types), 0 <= j < len(content_rating_types)).

Pie chart using Bokeh

In this part, I will plot the pie chart for the ‘Content Rating' column from the dataset.

I need to import a few more things to plot the pie chart.

from bokeh.palettes import Category20
from bokeh.plotting import figure
from bokeh.transform import cumsum
from math import pi
import panel as pn
pn.extension()

 

# getting the frequency of each ‘Content Rating’ type
Cnt_rating = {}
forin data['Content Rating']:
    ifin Cnt_rating:
        Cnt_rating[i] += 1
    else:
        Cnt_rating[i] = 1

 

# plotting the pie chart
DATA = pd.Series(Cnt_rating).reset_index(name='value').rename(columns={'index':'country'})
DATA['angle'] = DATA['value']/DATA['value'].sum() * 2*pi
DATA['color'] = Category20[len(Cnt_rating)]

p = figure(plot_height=350, title="Pie Chart", toolbar_location=None,
          tools="hover", tooltips="@country: @value", x_range=(-0.51.0))

r = p.wedge(x=0, y=1, radius=0.4,
        start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
        line_color="white", fill_color='color', legend_field='country', source=DATA)

p.axis.axis_label=None
p.axis.visible=False
p.grid.grid_line_color = None

bokeh_pane = pn.pane.Bokeh(p, theme="dark_minimal")
bokeh_pane

 

 

 

FAQs

1How should I use HoloViews as a short-qualified import?

A: We recommend importing HoloViews using import holoviews as hv.

2. Why are the sizing options so different between the Matplotlib and Bokeh backends?

A: The way plot sizes are computed is handled in radically different ways by these backends, with Matplotlib building plots ‘inside out’ (from plot components with their own sizes) and Bokeh building them ‘outside in’ (fitting plot components into a given overall size). Thus there is not currently any way to specify sizes in a way that is comparable between the two backends.

3. The default figure size is so tiny! How do I enlarge it?

A: Depending on the selected backend…

# for matplotlib:
hv_obj = hv_obj.opts(fig_size=500)

# for bokeh:
hv_obj = hv_obj.opts(width=1000, height=500)

4. How do I export a figure?

A: The easiest way to save a figure is the hv.save utility, which allows saving plots in different formats depending on what is supported by the selected backend:

# Using bokeh
hv.save(obj, 'plot.html', backend='bokeh')

# Using matplotlib
hv.save(obj, 'plot.svg', backend='matplotlib
You can also try this code with Online Python Compiler
Run Code

 

5. How do I create a Layout or Overlay object from an arbitrary list?

A: You can supply a list of elements directly to the Layout and Overlay constructors. For instance, you can use hv.Layout(elements) or hv.Overlay(elements).

Key Takeaways

So that is the end of the article. Let us brief out the article:

In this article, we saw the installation of holoviews. Furthermore, we explored some features and went through the installation process of holoviews. In the implementation part, we took a sample dataset to plot the various graphs. We came across curve plot, histogram plot, bar graph, converting tabular data to other forms of plots, heatmap, and piechart in the visualizations section.

Thus, holoviews is an excellent tool for visualization purposes.
Check out this problem - Largest Rectangle in Histogram

For more information, you can visit the official website of holoviews.

Hello readers, here’s a perfect course that will guide you to dive deep into Machine learning.

Happy Coding!

Live masterclass