Introduction
Whenever we read a dataset from the .json file, we often get a 'ValueError: Trailing data' error. Whenever we try to import a JSON file into a pandas dataframe, this type of error can take place even if the data is written in separate lines by using '\n'.
For example, below is the content of the JSON file, and we will try to read the data normally using pandas.
{"Id": 1, "Language": "Java", "Inventor": "Jame\nGosling" }
{"Id": 2, "Language": "C", "Inventor": "Dennis\nRitchie" }
{"Id": 3, "Language": "C++", "Inventor": "Bjarne\nStroustrup"}
{"Id": 4, "Language": "Python", "Inventor": "Guido\nvan Rossum"}
{"Id": 5, "Language": "PHP", "Inventor": "Rasmus\nLerdorf"}
{"Id": 6, "Language": "Javascript", "Inventor": "Brendan\nEich"}
Now, we will import this JSON file into pandas.
data = pd.read_json('data.json')
print(data.head())
We will get the following ValueError.
This is because the “Inventor” item in our JSON file contains ‘\n’ to indicate endlines, so we get a ValueError.
Also, see - Locally Weighted Regression.
Fixing Error
To fix this ValueError, we need to add an additional attribute, "lines" while reading the dataset. We need to make this "lines" attribute true so that pandas read the file as a JSON object per line.
The below code will perfectly read the dataset that is present in JSON format.
data = pd.read_json('data.json', lines = True)
print(data.head(6))
Id Language Inventor
0 1 Java Jame\nGosling
1 2 C Dennis\nRitchie
2 3 C++ Bjarne\nStroustrup
3 4 Python Guido\nvan Rossum
4 5 PHP Rasmus\nLerdorf
5 6 Javascript Brendan\nEich