Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Hello Ninjas, we are back with another R Programming Languageblog that is based on Factors in R Programming. R programming is an open-source language widely used for statistical analysis and data analysis. It has some unique features which make it very powerful.
Further in this article, we will be covering what are Factors in R programming and what is the use of factors with the help of some examples.
So let's start with what factors in R programming.
Definition
What do you mean by factors? Factors are data structures used to categorize and store data on multiple levels. The main advantage is that it can store both Integer and Character types of data. Factors are basically labels in R which are associated with unique integer values. It contains a predefined value(known as levels); by default, in R programming, levels are always sorted in alphabetical order.
Factors can be ordered and unordered and are essential for statistical analysis and plotting.
These factors are created using some functions, which we will see in the following sections of this blog.
Examples of Factors in R programming are like the quality of food which can be categorized as levels like it is good, bad, or best. Directions can be classified as North, West, South, and East. As you can see, all the levels have unique values.
There are six attributes of a factor in R programming.
X- Input vector, which needs to be transformed into a vector.
nmax- It specifies the upper bound for the maximum level number.
levels- It contains the unique values specified in the input vector X.
labels- It is a character vector that corresponds to the number of labels.
Exclude- It specifies the value which we want to remain excluded
ordered- It is a logical attribute that determines if the levels are ordered.
Use of Factors
Factors are used to represent categorical data. One of the important uses of factors is statistical modeling; since categorical variables(A categorical variable with two or more categories) enter into statistical models.
They are useful when displaying character vectors(like a name) in a non-alphabetical order.
They are useful when there are lots of repeating values.
Another use of factors is to represent a much more efficient way to store character values because each character value will be unique in levels, and the data is stored as a vector of integers.
Let's start with how to create a factor in R programming.
Creating a Factor in R
So in this section, we will cover how to create a factor in R programming. To create a factor, the factor() command is used, which will take a vector as an input.
Let's say we create a variable called food that will store the quality of food as its values.
Example:
# Creating a factor
food<- c("good", "bad", "best")
food
Output:
Now we will check whether 'food' is a vector or not using the factor() command.
# Checking if the food variable is a factor or not
food <- c("good", "bad", "best")
food
is.factor(food)
Output:
Explanation:
As you can see, false is being printed, which means the food variable is not a factor.
Note: The class() function can also be used to check whether the variable/ vector defined is a factor or not.
Let's see how we can convert food variables into a factor.
# Checking is food variable is factor or not
food <- c("good", "bad", "best")
food
is.factor(food)
# Using factor command, which is present in the base package
factor(food)
Output:
Explanation:
Now food is converted to factor, and levels are printed as bad, best, and good.
If you want to write a particular value more than one time in the food factor, then let's see what happens.
# Checking the uniqueness of levels
food <- c("good", "bad", "bad")
food
is.factor(food)
# Using factor command, which is present in the base package
factor(food)
Output:
Explanation:
As you can see, levels will not repeat; they will always be unique.
Note:
Now you may wonder if the sequence in which we have defined our values and the sequence which is printed in the levels is totally different. This is because b comes before g, even though the first element in this vector is "good".
When working with the factors in R programming, the two most common and used operations are changing the order of the levels as per convenience and changing the values of the levels. These two operations are described in the sections below.
Specifying the Order of Levels as per Choice
Sometimes the order of the factors components does not matter; other times, you might want to specify the order because it is meaningful (e.g., "bad", "good", "best") or a particular type of analysis requires it.
Before specifying the order of levels as per our choice.
Levels are printed according to alphabetical order.
After specifying the order of levels in factors in R programming.
It can be done using the ordered factor:
# Specifying the order of levels
food <- ordered(c("good", "good", "best", "bad"))
food <- ordered(food, levels = c("bad", "good", "best"))
food
Output:
Explanation:
Here we have used an ordered factor to get our levels in the order we want.
Accessing Elements of Factors
Elements of factors in R programming can be accessed using square[] brackets. Let's discuss how we can access the elements using an example.
Example:
# Accessing the elements of a factor
food <- factor(c("bad", "good", "best", "good"))
food[2]
food[3]
Output:
Explanation:
The second and third values of factor food are being printed as a result.
More than one element can be accessed using small() brackets.
Example:
# Accessing more than one element of a factor
food <- factor(c("bad", "good", "best", "good"))
food[c(2,4)]
food[3]
Output:
Explanation:
As we can see, two times good is printed as we have called for the second and fourth value from the factor food.
Modification of Factor
We can modify the value of a factor just by re-simplifying it. But the new values which need to be assigned as per our choice must be at the predefined level. Let's understand the modification of a factor with the help of an example.
Example:
# Modifying the components of the food factor
food <- factor(c("bad", "good", "best", "good", "bad"))
food[1]<- "best"
food
Output:
Explanation:
The value of the first component in the food factor is being reassigned to "best".
Let's see this with the help of another example.
Example:
# Modifying the components of the food factor
food <- factor(c("bad", "good", "best", "good", "bad"))
food[4]<- "best"
food
# Cannot assign values outside levels
food[6]<- "good"
Output:
Explanation:
The 4th element of the food factor is being changed to "best," and as the 6th element is not present in our factor, a new value can't be assigned to it.
Factors in Data Frame
Till now, you might have understood that factors in R programming are used to categorize and store the data in levels. Now how factors are created in data frames.
When creating any data frame with a text data column, R treats the text column as categorical data and creates factors on it.
Example:
# Create vectors for the data frame
food_item<- c("pizza", "pasta", "fries", "burger")
quality<- c("good", "best", "good", "bad")
# Create the data frame
input<- data.frame(food_item, quality)
print(input)
# Print the quality column
print(input$quality)
Output:
Explanation:
Data Frame is created of food_item and quality using frame function.
Using Factors
Let's go through some predefined R programming functions to see the factors' use.
Table Function
The table() function in R programming is used to tabulate all the categorical information.
Example:
# Create vectors for the data frame
food_item<- c("pizza", "pasta", "fries", "burger")
quality<- c("good", "best", "good", "bad")
# Create the data frame
input<- data.frame(food_item, quality)
# Calling the table function
table(input)
Output:
Explanation:
The data frame is tabulated using the table() function.
Summary Function
The summary() function is mainly used for summaries of the result of various functions. It helps to find out the missing data.
To reorder the levels of factors in R programming again, the factor() function is used. Let's see this with the help of an example.
Example:
# Create data frame called input
input <- data.frame(food=factor(c('pizza', 'pasta', 'fries', 'burger')),
sales=c(7, 8, 10, 14))
# View data frame
input
levels(input$food)
# Re-order the factor levels
input$food <- factor(input$food, levels=c('pizza', 'maggie', 'fries', 'burger'))
# Display factor levels of food
levels(input$food)
input <- input[order(levels(input$food)),]
Output:
Explanation:
The level of the food factor is being reordered from “pasta” to “maggie”.
Frequently Asked Questions
What is R programming language?
R is an open-source programming language and environment for our purposes, statistical computing, and analysis or data science.
What are the different data structures in R programming?
Data Structures in R programming are Vectors, Matrices, Lists, and Dataframe.
What is the use of factors in R programming?
Factors are used to store the categorical data, and It is used to represent the efficient way to keep the character values. And most importantly, it is helpful in data analysis for statistical analysis.
Which command is used to get the number of levels in factor in R programming?
The nlevels() function in R programming measures the number of levels.
How do we find that the given vector is a factor or not?
We can check that a given vector is a factor or not using class() and is.factor() functions.
Conclusion
Kudos, Ninjas; you have finally made it to the end of the blog on Factors in R Programming. We have seen what factors in R programming are, what they use, what all functions are performed with the factor to store the data in levels, and at last, what all predefined functions are there to see the use of factors.
Let's continue looking for more articles similar to this.