Table of contents
1.
Introduction
2.
Definition
2.1.
Attributes of a Factor
3.
Use of Factors
4.
Creating a Factor in R
5.
Specifying the Order of Levels as per Choice
6.
Accessing Elements of Factors 
7.
Modification of Factor
8.
Factors in Data Frame
9.
Using Factors
9.1.
Table Function
9.2.
Summary Function
10.
Reordering Factor Levels in R
11.
Frequently Asked Questions
11.1.
What is R programming language?
11.2.
What are the different data structures in R programming?
11.3.
What is the use of factors in R programming?
11.4.
Which command is used to get the number of levels in factor in R programming?
11.5.
How do we find that the given vector is a factor or not?
12.
Conclusion
Last Updated: Mar 27, 2024
Easy

Factors in R Programming

Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Hello Ninjas, we are back with another R Programming Language blog that is based on Factors in R Programming. R programming is an open-source language widely used for statistical analysis and data analysis. It has some unique features which make it very powerful. 

Further in this article, we will be covering what are Factors in R programming and what is the use of factors with the help of some examples. 

So let's start with what factors in R programming.

Factors in R Programming

Definition

What do you mean by factors? Factors are data structures used to categorize and store data on multiple levels. The main advantage is that it can store both Integer and Character types of data. Factors are basically labels in R which are associated with unique integer values. It contains a predefined value(known as levels); by default, in R programming, levels are always sorted in alphabetical order.

Factors can be ordered and unordered and are essential for statistical analysis and plotting.

These factors are created using some functions, which we will see in the following sections of this blog.

Examples of Factors in R programming are like the quality of food which can be categorized as levels like it is good, bad, or best. Directions can be classified as North, West, South, and East. As you can see, all the levels have unique values.

Also see, Must Do Coding Questions

Attributes of a Factor

There are six attributes of a factor in R programming.

Attributes of a factor

X- Input vector, which needs to be transformed into a vector.

nmax- It specifies the upper bound for the maximum level number.

levels- It contains the unique values specified in the input vector X.

labels- It is a character vector that corresponds to the number of labels.

Exclude- It specifies the value which we want to remain excluded

ordered- It is a logical attribute that determines if the levels are ordered.

Use of Factors

  • Factors are used to represent categorical data. One of the important uses of factors is statistical modeling; since categorical variables(A categorical variable with two or more categories) enter into statistical models.
     
  • They are useful when displaying character vectors(like a name) in a non-alphabetical order. 
     
  • They are useful when there are lots of repeating values.
     
  • Another use of factors is to represent a much more efficient way to store character values because each character value will be unique in levels, and the data is stored as a vector of integers.
     

Let's start with how to create a factor in R programming.

Creating a Factor in R

So in this section, we will cover how to create a factor in R programming. To create a factor, the factor() command is used, which will take a vector as an input.

Let's say we create a variable called food that will store the quality of food as its values.

Example:

# Creating a factor
food<- c("good", "bad", "best")
food


Output:

Output of creating a food variable

Now we will check whether 'food' is a vector or not using the factor() command.

# Checking if the food variable is a factor or not
food <- c("good", "bad", "best")
food
is.factor(food)


Output:

Output of creating a food variable

Explanation:

As you can see, false is being printed, which means the food variable is not a factor. 

Note: The class() function can also be used to check whether the variable/ vector defined is a factor or not.

Let's see how we can convert food variables into a factor.

# Checking is food variable is factor or not
food <- c("good", "bad", "best")
food
is.factor(food)

# Using factor command, which is present in the base package
factor(food)


Output:

Converting food variables into a factor

Explanation:

Now food is converted to factor, and levels are printed as bad, best, and good.

If you want to write a particular value more than one time in the food factor, then let's see what happens.

# Checking the uniqueness of levels
food <- c("good", "bad", "bad")
food
is.factor(food)

# Using factor command, which is present in the base package
factor(food)


Output:

Repetition of a value in food factor

Explanation:

As you can see, levels will not repeat; they will always be unique.

Note:

Now you may wonder if the sequence in which we have defined our values and the sequence which is printed in the levels is totally different. This is because b comes before g, even though the first element in this vector is "good".

When working with the factors in R programming, the two most common and used operations are changing the order of the levels as per convenience and changing the values of the levels. These two operations are described in the sections below.

Specifying the Order of Levels as per Choice

Sometimes the order of the factors components does not matter; other times, you might want to specify the order because it is meaningful (e.g., "bad", "good", "best") or a particular type of analysis requires it. 

Before specifying the order of levels as per our choice.

Example:

# Creating a food factor
food <- factor(c("bad", "good", "best", "best", "good"))
levels(food)


Output:

Before specifying the order of levels as per our choice

Explanation:

Levels are printed according to alphabetical order.

After specifying the order of levels in factors in R programming.

It can be done using the ordered factor:

# Specifying the order of levels 
food <- ordered(c("good", "good", "best", "bad"))
food <- ordered(food, levels = c("bad", "good", "best"))
food


Output:

specifying the order of levels in factors

Explanation:

Here we have used an ordered factor to get our levels in the order we want.

Accessing Elements of Factors 

Elements of factors in R programming can be accessed using square[] brackets. Let's discuss how we can access the elements using an example.

Example:

# Accessing the elements of a factor
food <- factor(c("bad", "good", "best", "good"))
food[2]
food[3]


Output:

Accessing elements of factors  Output

Explanation:

The second and third values of factor food are being printed as a result.

More than one element can be accessed using small() brackets.

Example:

# Accessing more than one element of a factor
food <- factor(c("bad", "good", "best", "good"))
food[c(2,4)]
food[3]


Output:

How to access more than one element

Explanation:

As we can see, two times good is printed as we have called for the second and fourth value from the factor food.

Modification of Factor

We can modify the value of a factor just by re-simplifying it. But the new values which need to be assigned as per our choice must be at the predefined level. Let's understand the modification of a factor with the help of an example.

Example:

# Modifying the components of the food factor
food <- factor(c("bad", "good", "best", "good", "bad"))
food[1]<- "best"
food


Output:

Modification of Factor

Explanation:

The value of the first component in the food factor is being reassigned to "best".

Let's see this with the help of another example.

Example:

# Modifying the components of the food factor
food <- factor(c("bad", "good", "best", "good", "bad"))
food[4]<- "best"
food

# Cannot assign values outside levels
food[6]<- "good"


Output:

Modification of Factor

Explanation:

The 4th element of the food factor is being changed to "best," and as the 6th element is not present in our factor, a new value can't be assigned to it. 

Factors in Data Frame

Till now, you might have understood that factors in R programming are used to categorize and store the data in levels. Now how factors are created in data frames.

When creating any data frame with a text data column, R treats the text column as categorical data and creates factors on it.

Example:

# Create vectors for the data frame
food_item<- c("pizza", "pasta", "fries", "burger")
quality<- c("good", "best", "good", "bad")

# Create the data frame
input<- data.frame(food_item, quality)
print(input)

# Print the quality column 
print(input$quality)


Output:

Factors in Data Frame

Explanation:

Data Frame is created of food_item and quality using frame function.

Using Factors

Let's go through some predefined R programming functions to see the factors' use.

Table Function

The table() function in R programming is used to tabulate all the categorical information.

Example:

# Create vectors for the data frame
food_item<- c("pizza", "pasta", "fries", "burger")
quality<- c("good", "best", "good", "bad")

# Create the data frame 
input<- data.frame(food_item, quality)

# Calling the table function
table(input)


Output:

Table function Output

Explanation:

The data frame is tabulated using the table() function.

Summary Function

The summary() function is mainly used for summaries of the result of various functions. It helps to find out the missing data.

Example:

food_item <- c("pizza", "burger", "pasta", "fries")
quantity <- c(2, 5, 3, 4)

summary(food_item)
summary(quantity)


Output:

 summary() function output

Explanation:

The summary function is called for food_item and quantity.

You can also read about mock interview.

Reordering Factor Levels in R

To reorder the levels of factors in R programming again, the factor() function is used. Let's see this with the help of an example.

Example:

# Create data frame called input
input <- data.frame(food=factor(c('pizza', 'pasta', 'fries', 'burger')),
                 sales=c(7, 8, 10, 14))

# View data frame
input

levels(input$food)
# Re-order the factor levels
input$food <- factor(input$food, levels=c('pizza', 'maggie', 'fries', 'burger'))

# Display factor levels of food
levels(input$food)
input <- input[order(levels(input$food)),]


Output:

Reordering Factor Levels in R Output

Explanation:

The level of the food factor is being reordered from “pasta” to “maggie”.

Frequently Asked Questions

What is R programming language?

R is an open-source programming language and environment for our purposes, statistical computing, and analysis or data science.

What are the different data structures in R programming?

Data Structures in R programming are Vectors, Matrices, Lists, and Dataframe.

What is the use of factors in R programming?

Factors are used to store the categorical data, and It is used to represent the efficient way to keep the character values. And most importantly, it is helpful in data analysis for statistical analysis.

Which command is used to get the number of levels in factor in R programming?

The nlevels() function in R programming measures the number of levels.

How do we find that the given vector is a factor or not?

We can check that a given vector is a factor or not using class() and is.factor() functions.

Conclusion

Kudos, Ninjas; you have finally made it to the end of the blog on Factors in R Programming. We have seen what factors in R programming are, what they use, what all functions are performed with the factor to store the data in levels, and at last, what all predefined functions are there to see the use of factors.

Let's continue looking for more articles similar to this.

Recommended Articles

  1. R- Environment
  2. Arrays in R Programming
  3. R Programming Interview Questions
  4. Resume for software engineer fresher
     

Check out The Interview guide for Product Based Companies and some of the Popular Interview Problems from Top companies like AmazonAdobeGoogleUberMicrosoft, etc., on Coding Ninjas Studio.

Also, check out some of the Guided Paths on topics such as Data Structures and AlgorithmsCompetitive ProgrammingOperating SystemsComputer Networks, DBMSSystem Design, etc., as well as some Contests, Test SeriesInterview Bundles, and some Interview Experiences curated by top Industry Experts only on Coding Ninjas Studio.

Cheers!

Live masterclass