Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Visualizing and Predicting Analysis of Cricket Match - Part 1
2.1.
What is Data Science?
3.
Some Interesting Analysis
4.
Frequently Asked Questions
4.1.
How do Data Scientists use statistics?
4.2.
What is bias in data science?
4.3.
What is heuristic thinking?
5.
Conclusion
Last Updated: Mar 27, 2024

Visualizing and Predicting Analysis of Cricket Match - Part 1

Author Shivam Sinha
0 upvote

Introduction

In this article, We will discuss Data Science but don't worry; learning should be fun, so we will learn it with the help of an entertaining example. So how many of you watch IPL? Yes, the Indian Premier League. Most of you love watching it, have a favorite team, which you support throughout the IPL, and wish that your team wins. What if I say we can predict which team will win next season's IPL? Do you think that is even possible? Are we going to learn Astrology in this article? Yes, we are going to learn about astrology which is called Data Science. As we know astrology is just a prediction and it is not sure that it will be 100% true, similarly, data science just predicts results that obviously can't be 100% true. Now the question is, how is this IPL and data science related? And how will we predict the winning team and other related things? This is done with the help of Data Science, and we will learn everything about it in this article.

Visualizing and Predicting Analysis of Cricket Match - Part 1

What is Data Science?

Data Science comprises two words, i.e., Data and Science. Let's first understand what Data is. Data is essential information about anything. For example, the number of apples on a tree, the taste of ice cream, the number of stars in the universe, the percentage of people who like the government, etc. All of these are nothing but data. We have an enormous amount of data around us, but data alone is useless. It's important to know what Data is beneficial, what Data needs to be analyzed, and how patterns can be identified to use that Data. Consider what you do when you count no. Leaf? What is it good for? Useless Data that does nothing for us. But the percentage of people who prefer the government is useful data. It will come in handy in politics. It helps governments understand what they should change and how they can change it. This Data is useful in elections, but simply recording this Data forces us to analyze it, compare it, and improve it. Collecting, studying, observing, and making decisions on Data is called data science.

Interpret all data, derive useful information from it, and use it in decision-making processes with the help of data science.

"Which players should you buy and which should not?", "How much do you need to spend on which player?", "What is each player worth?". These things are related to data science, and IPL teams have started hiring companies that are experts in Data Analysis. Performance Analytics companies analyze how good players are and develop strategies for those players. These data analytics companies deeply analyze players' data to understand who is better at what. One of the metrics used in IPL is the MVPI, or Most Valuable Player Index, which is a weighted composite score of various player attributes.

IPL and Big Data Analytics: A Match Made in Heaven? - Long-reads News ,  Firstpost

Let's see some of the Bowling Metrics : 

I. Economy: Run scored / (Number of ball bowled by bowler / 6).

II. Wicket-taking ability: No. Of balls bowled / Wicket taken.

III. Consistency: Run conceded / Wicket taken.

IV. Critical Wicket Taking Ability: No. of times four or five-wicket taken / No. Of the inning played.

 

Let's see some of the Batsman Metrics : 

I. Hard-hitting Ability: How many fours and sizes does a batsman score? The below given equation is used.

Hard-hitting Ability = (Fours + Sixes) / Number of balls played by batsman

How many fours and sixes has a batsman hit in his IPL career divided by the number of ball he played? This calculates the hard-hitting ability of a batsman.

II. Finishing Ability: Number of not out innings divided by the total innings played by the player.

Finishing Ability = Not out innings / Total innings played.

III. Consistency of Player: Total Run scored / Number Of times out.

IV. Running between the wickets: (Total run – (Fours + Sixes)) / (total numbers of balls played – boundary balls).

If this fourth metric is better for batsmen than the hard-hitting metrics, then you can easily guess that he is good at getting singles, twos, and threes but not good at hitting boundaries on other balls.

Data Science behind IPL metrics

This data helps us understand the strong and weak points of different players; a player is good at hitting boundaries or at running between the wickets, a bowler performs better against right-handed batsmen or left-handed batsmen, and a batsman performs better against fast or spin bowlers. Analysis can also work out in "In which Stadium and in what type of weather does a player perform better?"

In an interview, Virender Sehwag sums up the importance of data science very well. He said: "Every game you play records your good performance, your bad performance, which bowler you played against, which team and which bowler you scored against, and the whole data Easy to show you are good against Pakistan but they didn't play well against Bangladesh They are good against South Africa but not well against England 2003 I was amazed when a computer analyst came to me in 2011 and showed me videos and various kinds of data analysis!!!".

Data Science behind IPL teams

While auctioning players off, IPL teams that don't have a lot of money will want to know if the player they're buying is worth the money they spent on their team. The most expensive player in an IPL auction is often not the best performing player in the IPL. The Rajasthan Royals are one of the cheapest teams in the 2008 season. That means they spend a lot less money on their players than other teams. Despite being one of his cheapest teams, they won the IPL. 

Page 5 - IPL: 5 best last over finishes from IPL 2008

Some Interesting Analysis

I. One of the analyzes looking at IPL matches played between 2008 and 2017 shows that Eden Garden and M Chinnaswamy Stadium are the best venues to track results. Tossing and fielding would be a better option. Let's do the same analysis for the IPL match. 

Dataset Description: This contains a total of 17 columns. Let's see them.

Attribute Information :

  • team1
  • team2
  • id
  • City
  • date
  • player_of_match
  • toss_winner
  • toss_decision
  • winner
  • result
  • result_margin
  • eliminator
  • method
  • venue
  • umpire1
  • umpire2 

 

In the code section, we will directly see the main part of the code. To know detailed descriptions, you can directly download the Jupyter Notebook.

Let's load the libraries:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

 

Look at the dataset:

df = pd. read_csv(r'C:USERhp Matches 2008-2020.csv')


Let's remove the columns that are not of use from the dataset:


df.drop(labels = ['id', 'date', 'player_of_match', 'result', 'neutral_venue', 'result_margin','umpire1', 'umpire2', 'eliminator', 'method'], axis = 1,inplace = True)


Now, analyzing the optimized or correct option the toss win:

win_target = loss_target = win_chassing = loss_chassing = 0 
for i in range(len(df)) : 
    if df.toss_result.iloc[i] == 'bat' :
        if df.winner_toss_team.iloc[i] == df.winner.iloc[i] : 
            win_target += 1
        else :
            loss_target += 1
    else : 
        if df.winner_toss_team.iloc[i] == df.winner.iloc[i] : 
            win_chassing += 1
        else :
            loss_chassing += 1
print('{} times captain have chosen batting over fielding and win the match.'.format(win_target))
print('{} times captain have chosen batting option over fielding but loose the match.'.format(loss_target))
print('{} times captain have chosen fielding option over batting and win the match.'.format(win_chassing))
print('{} times captain have chosen fielding option but loose the match.'.format(loss_chassing))


Let's create a particular column and describe how the team wins the match(by chasing the score or by giving the target):


for i in range(len(df)) : 
    if df.toss_result.iloc[i] == 'bat' :    
        if df.toss_result.iloc[i] == df.winner.iloc[i] : 
            # captain chooses the batting option and win the match, then it will count as a target.
            df['target'].iloc[i] = 1       
        else :
            # captain chooses batting option and loses the match, then it will count as chasing.
            df['chase'].iloc[i] = 1   
    else :          
        if df.toss_result.iloc[i] == df.winner.iloc[i] : 
            # captain chooses fielding option and wins the match, then it will count in chasing.
            df['chase'].iloc[i] = 1   
        else :
            # captain chooses fielding option and loses the match, which will count in a target.
            df['target'].iloc[i] = 1


Let's take out some more useful information from the data:


targetlist = []
chaselist = []
for i in top10_stadium : 
    print('Analysis on "{} Stadium"'.format(i))
    aa = np.sum(df[df.venue1 == i].target)
    bb = np.sum(df[df.venue1 == i].chase)
    print(aa, 'times team gave good target and win the match.')
    print(bb, 'times team easily chase the score and win the match.') 
    targetlist.append(aa)
    chaselist.append(bb)
    print()
 

 

Let's visualize the above data for better understanding:

 

top10_stadium = ['Eden Gardens, Kolkata',
                              'Feroz Shah Kotla, Delhi',
                              'Wankhede Stadium, Mumbai',
                              'M Chinnaswamy Stadium, Bangalore',
                              'Rajiv Gandhi International Stadium, Uppal, Hyderabad',
                              'Sawai Mansingh Stadium, Jaipur', 
                              'MA Chidambaram Stadium, Chepauk, Chennai',
                              'Punjab Cricket Association Stadium, Mohali, Chandigarh',
                              'Sheikh Zayed Stadium, Abu Dhabi',
                              'Maharashtra Cricket Association Stadium, Pune']
data = {'target': [30, 34, 36, 26, 27, 35, 15, 15, 19, 10],
        'chase': [47, 39, 37, 36, 37, 22, 32, 20, 14, 16]}
df1 = pd.DataFrame(data,columns=['target', 'chase'], index = top10_stadium)
df1.plot.barh(figsize = (10,10))
plt.style.use('seaborn-bright')
plt.title('Top-10 Stadiums')
plt.ylabel('Stadiums')
plt.xlabel('No. of Matches Win')
plt.xticks(np.arange(0, 54, 3))
plt.show()

 The graph above shows how often teams score good goals or easily chase goals at a given stadium. Look at the horizontal bar in the stadium "Eden Garden, Kolkata." This bar shows over 45 times, the game was easily scored and won, and about 30 times, the team aimed and won the game. From this, we can easily conclude that this stadium is good for chasing points. Therefore, if a match is played at this location and the team wins a draw, then the defense is the better option. Similarly, we can quickly analyze the entire plot.

Plot of target and chase

 

Let's convert the above data in terms of percentage for better understanding:

 

target = []
chase = []
for i in top10_stadium : 
    print(i)
    aa = np.sum(df[df.venue1 == i].target)
    bb = np.sum(df[df.venue1 == i].chase)
    total = aa + bb
    t = ((aa / tot) * 100)
    c = ((bb / tot) * 100)
    target.append(round(t, 2))
    chase.append(round(c, 2))
    print('{:.2f}% probability that if you choose to bat, then you will win the match.'.format((aa / tot) * 100))
    print('{:.2f}% probability that if you choose to field, then you will win the match.'.format((bb / tot) * 100))
    print()
probability of batting, fielding and balling

 Let's visualize the above data for better understanding:

top10_stadium = ['Eden Gardens, Kolkata',
                              'Feroz Shah Kotla, Delhi',
                              'Wankhede Stadium, Mumbai',
                              'M Chinnaswamy Stadium, Bangalore',
                              'Rajiv Gandhi International Stadium, Uppal, Hyderabad',
                              'Sawai Mansingh Stadium, Jaipur', 
                              'MA Chidambaram Stadium, Chepauk, Chennai',
                              'Punjab Cricket Association Stadium, Mohali, Chandigarh',
                              'Sheikh Zayed Stadium, Abu Dhabi',
                              'Maharashtra Cricket Association Stadium, Pune']
data = {'Bat_first': target,
        'Field_first': chase}
df2 = pd.DataFrame(data,columns=['Bat_first', 'Field_first'], index = top10_stadium)

 

probability of bat first or ball first
df2.plot.barh(figsize = (10,10))
plt.style.use('seaborn-bright')
plt.title('Top-10 Stadiums')
plt.ylabel('Stadiums')
plt.xlabel('Probability to win')
plt.xticks(np.arange(0, 75, 3))
plt.show()

From this, we can easily conclude that the stadium is more suitable for scoring goals against the opposing team. Therefore, batting is the better option if a match is played at this venue and either team wins a draw. Similarly, we can easily analyze the entire plot. For more information, you can directly download the jupyter notebook.

II. Further analysis considered batting averages and batting averages for all IPL players and concluded that all players under the age of 35 had batting averages. He averaged 24.51 while his strike rate was 126.84, an average strike rate for players over the age of 35. The batting average is significantly below the 112.1 batting average from 21.34. This shows that teams should prioritize younger players if they need to improve their performance.

Frequently Asked Questions

How do Data Scientists use statistics?

Statistics plays a powerful role in Data Science. It is one of the most important disciplines to provide tools and methods to find structure and to give deeper insight into data. It serves a great impact on data acquisition, exploration, analysis, validation, etc.

What is bias in data science?

Historical data bias occurs when socio-cultural prejudices and beliefs are mirrored in systematic processes.

What is heuristic thinking?

Heuristics are mental shortcuts that can facilitate problem-solving and probability judgments. These strategies are generalizations, or rules-of-thumb, reduce cognitive load and can be effective for making immediate judgments, however, they often result in irrational or inaccurate conclusions.

Conclusion

This article visualizes and predicts the analysis of cricket matches. We will continue this article in part 2. 

To learn more, see Visualizing and Predicting Analysis of Cricket part 2, Cloud ComputingMicrosoft Azure, C++ with Data StructureDBMSOperating System by Coding Ninjas, and keep practicing on our platform Coding Ninjas Studio.

If you think you are ready for the tech giants company, check out the mock test series on code studio.

You can also refer to our Guided Path on Coding Ninjas Studio to upskill yourself in domains like Data Structures and AlgorithmsCompetitive ProgrammingAptitude, and many more!. You can also prepare for tech giants companies like Amazon, Microsoft, Uber, etc., by looking for the questions asked by them in recent interviews. If you want to prepare for placements, refer to the interview bundle. If you are nervous about your interviews, you can see interview experiences to get ideas about questions that have been asked by these companies.

 Do upvote if you find this blog helpful!

Be a Ninja

Happy Coding!

ninjas logo

 

 

Live masterclass