## Algorithm

The PageRank algorithm generates a **probability distribution** that indicates the possibility of a random user clicking on links and ending up on a specific page. PageRank can be determined for any large collection of documents. Several research publications assume that the distribution is uniformly distributed among all documents in the collection at the start of the computational process. The PageRank computations necessitate numerous trips through the collection, referred to as "**iterations**", to update PageRank values to represent the** actual theoretical value** more closely.

A probability is a number between 0 and 1, representing the likelihood of something happening. A "50 per cent possibility" of something happening is typically stated as a 0.5 probability. As a result, a document with a PageRank of 0.5 means that a person clicking on a random link will be sent to that document 50% of the time.

### Simplified Algorithm

Assume a universe with four web pages, namely, A, B, C, and D. Links between pages are disregarded, as are repeated outbound links from one single page to another single page. For all pages, PageRank is set to the same value. The whole number of pages on the web at the time was the sum of PageRank over all pages in the original form of PageRank; therefore, each page in this scenario would have an initial value of 1. The remainder of this section and later versions of PageRank, on the other hand, assume a probability distribution between 0 and 1. As a result, in this example, the initial value for each page is 0.25.

On the next iteration, the PageRank sent from a specific page to the targets of its outbound links is distributed evenly across all outbound connections.

If the system's only links to A were from pages B, C, and D, each link would transmit 0.25 PageRank to A on the next iteration, for a total of 0.75.

Instead, suppose page B is linked to pages C and A, page C is linked to page A, and page D is connected to all three sites. As a result, page B would transmit half of its existing value, or 0.125, to page A and the other half, or 0.125, to page C on the first iteration. Page C's entire existing value, 0.25, would be transferred to the single page it connects to, A. D would transfer one-third of its existing value, or around 0.083, to A because it had three outgoing linkages. Page A will have a PageRank of roughly 0.458 at the end of this cycle.

To put it another way, an outgoing link's PageRank is equal to the document's PageRank score divided by the number of outbound links L().

The PageRank value for any page u can be stated as follows in general:

i.e., the PageRank value of a page u is determined by the PageRank values of each page v in the set Bu (set of all pages connecting to page u), divided by the number L(v) of links from page v.

For the calculation of PageRank, the algorithm uses a damping factor. The formula, once the damping factor is factored in, is as follows:

here, d is the damping factor, and N is the number of elements in set Bu(total pages in the collection).

## Python Implementation

The implementation of the PageRank Algorithm using Python is below:

### Code

```
import numpy as np
def pageRank(M, num_iter: int = 100, d: float = 0.85):
"""
Parameters
----------
M : numpy array
adjacency matrix where M[i,j] represents the link from 'j' to 'i', such that for all 'j' -> sum(i, M[i,j]) = 1
num_iter : int, optional
number of iterations (default 100)
d : float, optional
damping factor (default 0.85)
Returns
-------
numpy array
vector of ranks such that v[i] is the i-th rank from [0, 1],
v sums to 1
"""
N = M.shape[1]
v = np.ones(N) / N
M_hat = (d * M + (1 - d) / N)
for i in range(num_iter):
v = M_hat @ v
return v
M = np.array([[0, 0, 0, 0, 1],
[0.5, 0, 0, 0, 0],
[0.5, 1, 0, 0, 0],
[0, 0, 1, 0.5, 0],
[0, 0, 0, 0.5, 0]])
v = pageRank(M, 100, 0.85)
print(v)
```

### Output

If we calculate the sum of these five values, it will equal 1.

## Applications of PageRank Algorithm

The applications of the PageRank algorithm are stated as follows:

- The PageRank algorithm's earliest and most obvious application is in search engines. PageRank can rank websites to give more relevant search results faster because it was built expressly for use in Google's search engine.
- The PageRank algorithm is used to search networks that are not connected to the internet. This can be used in academic papers; PageRank can determine the most effective and referenced publications in a field by utilising citations instead of links.
- An example of a real-world application of the PageRank method is identifying significant species in an ecology. Applying the PageRank algorithm to the relationships between species in an ecosystem allows the user to determine the most notable species. As a result, assigning value to essential animal and plant species in an ecosystem makes it easier to predict outcomes such as extinction or a species' removal from the ecosystem.

## Advantages and Disadvantages

The advantages and disadvantages of the PageRank algorithm are as follows:

**Also Read - **__Kadanes Algorithm__

## Frequently Asked Questions

**Name some other link-based ranking algorithms.**

The HITS algorithm (developed by Jon Kleinberg and now used by Teoma and Ask.com), the IBM CLEVER project, the TrustRank algorithm, and the Hummingbird algorithm are other link-based ranking algorithms for Web pages.

**State the factors that influenced the PageRank Algorithm.**

The factors that highly influenced the PageRank Algorithm are:

-> Anchor text

-> The likelihood of being clicked

-> Internal links

-> NoFollow links

**What is the damping factor?**

According to the PageRank theory, a hypothetical surfer who clicks on links at random will soon stop clicking at some point in time. A damping factor d is the probability that the individual will continue to surf at any point. Different damping factors have been investigated in various experiments; however, it is generally anticipated that the damping factor will be fixed at around 0.85.

**Can the PageRank of a webpage be zero?**

As observed from the formula above, the PR of a page cannot be zero. If the page has no pages pointing to it, still the PR will be

The page can still be accessible if it is saved in, say, bookmarks.

## Conclusion

In this article, we learned about the PageRank algorithm. We saw its description, simplified working, applications as well as its advantages and disadvantages.

We hope this blog has helped you enhance your knowledge. If you want to learn more, check out our articles on __Data Mining and Data Analytics - Coding Ninjas Coding Ninjas Studio__, __AdaBoost Algorithm - Coding Ninjas Coding Ninjas Studio__, __Expectation-Maximization Algorithm - Coding Ninjas Coding Ninjas Studio__ and __Top 10 Common Data Mining Algorithms â€“ Coding Ninjas Blog__. Do upvote our blog to help other ninjas grow.

Head over to our practice platform __Coding Ninjas Studio__ to practice top problems, take various __guided paths__, attempt __mock tests__, read __interview experiences__, solve __problems__, participate in __contests__ and much more!

**Happy Reading!**