Aargh! It might be frustrating when you encounter broken links, but why do we get broken links? How to find broken links?
We are here to solve all your doubts.
This article will discuss broken links in Selenium and how to find broken links with examples.
What are Broken Links?
Any link on a web page that no longer functions due to an implicit problem with the link is considered to be broken, also known as a dead link. The links can be temporarily inaccessible due to server problems or improper back-end configuration.
Sometimes an error message, such as a page not found, appears when someone clicks on one of these links. The error message itself might not even exist. Links to content (such as PDFs, documents, etc.) that have been relocated or deleted are typical examples of broken links, in addition to pages that return 404 errors.
The Broken links generally have 4xx and 5xx status codes and are invalid HTTP requests.
HTTP Status Codes
The following are a few typical HTTP Status Codes displayed by the web server when a link is broken:
Codes
Description
400(Bad Request)
Due to an incorrect URL, the server cannot process the request.
400(Bad Request- Bad Host)
Due to the incorrect host name, the server cannot process the request.
400(Bad Request- Bad URL)
Due to the URL's improper syntax and missing characters like brackets and slashes, the server cannot process it.
400(Bad Request- Timeout)
400(Bad Request- Empty)
The server's response, which has no content and no response code, is empty.
400(Bad Request- Reset)
The server cannot process the request because it is busy processing other requests.
403(Forbidden)
Due to the need for authorization, the server declines to process the request.
404(Page not found)
The page isn't available on the server.
408(Request Timeout)
The request timed out while awaiting the request.
410(Gone)
The page is gone. This error code lasts longer than 404.
503(Service Unavailable)
The server cannot handle the request because of a temporary overload.
Why check for Broken Links in Selenium?
Because consumers need help finding the information or service they seek, broken links defeat the initial purpose of having the website. Users will be taken to an error page if they click on a broken link. This results in a poor user experience.
Links on a website must all be tested to ensure they function as expected. However, personally assessing each link would require a significant amount of time, effort, and money, given that the majority of websites require hundreds or even thousands to function. Additionally, it would be completely unnecessary, considering the availability of automated Selenium testing.
Here are a few more reasons that support the fact why check for broken links in selenium:
Free Availability.
Great Performance.
High Integrity.
Relevance of Different Programming Languages.
Why do we get Broken Links?
The following are a few reasons for getting broken links:
Incorrect or mistyped URL submitted by the user.
Links to content that has been relocated or destroyed, such as movies and documents. The "internal links" should be redirected to the designated URLs whenever the material is moved.
The website's structural changes (such as permalinks) with internal or URL redirection are improperly configured.
Website maintenance that causes a brief period during which it is not accessible.
Links with broken HTML tags, corrupt embedded elements, broken JavaScript, inappropriate HTML/CSS changes, etc., might cause a break on a page.
If an IP address or country is banned, geolocation restrictions limit access to the website from those addresses or those locations. Selenium geolocation testing ensures that the experience is adjusted for the location (or nation) from which it is being used.
Common Reasons for Broken Links
Page Deletion: The linked page may have been deleted or removed from the website.
URL Change: The URL of the linked page may have been modified.
Website Restructuring: Changes in website structure or navigation could result in broken links.
Server Errors: Temporary or permanent server errors can lead to broken links.
Content Migration: During content migration, links may not be updated properly.
External Website Changes: Changes in external websites linked to may cause broken links.
Typos and Misspellings: Incorrectly typed URLs or misspelled links can lead to broken links.
Expired Content: Content referenced by links may have expired or become outdated.
How to identify the Broken Links in Selenium?
It is easy to check for broken links in Selenium. The basic rules of broken link testing using Selenium remain the same regardless of the language used with it. The steps required for testing broken links using Selenium WebDriver:
Go to the website you want to visit.
Click the Inspect option from the dropdown menu by right-clicking on the Web element.
To collect data about every link on the page, use the <a> tag.
For each link, send an HTTP request.
Check the response code returned in reply to the request sent in the previous step.
Use the server's response code to determine whether the link is active.
Repeat steps (2-4) for every link on the page.
Examples
We will discuss in detail to find the broken link in Selenium with an example code.
To iterate through the List, use the following command:
Iterator<WebElement> itr = links.iterator();
Step 3: Locating and Verifying the URL.
This section will determine whether the URL is empty or null and whether it is a third-party domain. Obtain the anchor tag's href and store it in the URL variable.
URL = itr.next().getAttribute("href");
Here, the URL is a variable.
If the criteria have been met, determine if the URL is null or empty and skip the next step.
if(URL == null || URL.isEmpty())
{
System.out.println("URL is either empty or not specified for anchor tag");
continue;
}
Verify if the URL is a part of the primary domain or a third party. If it belongs to a third party, skip the next steps.
if(!URL.startsWith(homePage)){
System.out.println("URL refers to some other domain, avoiding it.");
continue;
}
Step 4: Submit an HTTP request.
Methods to send HTTP requests and retrieve HTTP response codes are available in the HttpURLConnection class. Therefore, the URLConnection result of the openConnection() method is typically converted to HttpURLConnection.
We can specify "HEAD" as the Request type rather than "GET” so that only the headers are returned and not the document body.
HUC.setRequestMethod("HEAD");
The connection to the URL is made by calling the connect() method, and the request is sent.
HUC.connect();
Step 5: Verifying Links.
We may obtain the response code for the request using thegetResponseCode() method.
response = HUC.getResponseCode();
Here, the response is a variable.
We will check the link status based on the response code.
if(response >= 400){
System.out.println(url+" is not a valid link");
}
else{
System.out.println(url+" is a working link");
}
As a result, we can print if links are active or broken and retrieve all links from a web page.
Hooray!! You are successful in finding the broken links in Selenium.
How to get ALL Links of a Web Page
To retrieve all links from a web page:
Parse HTML: Use a library like BeautifulSoup in Python or Jsoup in Java to parse the HTML content of the web page.
Find Links: Utilize methods provided by the parsing library to find all anchor tags (<a>) in the HTML content.
Extract URLs: Extract the href attribute from each anchor tag to obtain the URL of the linked page.
Filter Links: Optionally filter the extracted URLs based on criteria such as URL format, domain, or relevance.
Store Links: Store the extracted links in a data structure like a list or set for further processing or analysis.
Frequently Asked Questions
What are the types of Broken Links?
There are mainly three types of broken links: Broken Backlinks, Broken External Links, and Broken Internal Links. Links that lead to empty pages on your website or another website are broken internal and external links, respectively. On the other hand, a broken backlink points to a page on another domain that is no longer accessible from your website.
How could broken links be found using the Selenium Webdriver?
Depending on the <a> tag, assemble all the links on the website. To access the link, send an HTTP request and read the response code. Using the HTTP response code, determine whether the link is working.
How to find broken links in Selenium program?
To find broken links in a Selenium program, first, gather all page links. Then, send HTTP requests to each link and check response codes (e.g., 404). Identify broken links and report them for further action.
Conclusion
This article discussed how to find broken links in Selenium with examples.
Links on a website must all be tested to ensure they function as expected. However, personally assessing each link would require a significant amount of time, effort, and money. Thanks to selenium's automated system, it is easy to find broken links in Selenium.
We hope this blog has helped you. We recommend you visit our articles on different topics of Selenium, such as
If you liked our article, do upvote our article and help other ninjas grow. You can refer to our Guided Path on Coding Ninjas Studio to upskill yourself in Data Structures and Algorithms, Competitive Programming, System Design, and many more!
Head over to our practice platform Coding Ninjas Studio to practice top problems, attemptmock tests, readinterview experiences and interview bundles, follow guided paths for placement preparations, and much more!!