Pandas To_json() Redundant Backslashes
Solution 1:
I had the same issue : the solution is in 3 steps
1- Data-frame form csv or in my case from xlsx:
excel_df= pd.read_excel(dataset ,sheet_name=my_sheet_name)
2- convert to json (if you have date in your data)
json_str = excel_df.to_json(orient='records' ,date_format='iso')
3-The most important thing : json.loads **** this is it !
parsed = json.loads(json_str)
4- (facultative) you can write or send the json file : for example : write locally
withopen(out, 'w') as json_file:
json_file.write(json.dumps({"data": parsed}, indent=4))
more info : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html
Solution 2:
Pandas is escaping the "
character because it thinks the values in the json columns are text. To get the desired behaviour, simply parse the values in the json column as json.
let the file data.csv have the following content (with quotes escaped).
# data.csv
movie_id,title,cast
19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""credit_id"": ""5602a8a7c3a3685532001c9a"", ""gender"": 2, ""id"": 65731, ""name"": ""Sam Worthington"", ""order"": 0}, {""cast_id"": 3, ""character"": ""Neytiri"", ""credit_id"": ""52fe48009251416c750ac9cb"", ""gender"": 1, ""id"": 8691, ""name"": ""Zoe Saldana"", ""order"": 1}, {""cast_id"": 25, ""character"": ""Dr. Grace Augustine"", ""credit_id"": ""52fe48009251416c750aca39"", ""gender"": 1, ""id"": 10205, ""name"": ""Sigourney Weaver"", ""order"": 2}, {""cast_id"": 4, ""character"": ""Col. Quaritch"", ""credit_id"": ""52fe48009251416c750ac9cf"", ""gender"": 2, ""id"": 32747, ""name"": ""Stephen Lang"", ""order"": 3}]"
read this into a dataframe, then apply the json.loads
function & write out to a file as json.
df = pd.read_csv('data.csv')
df.cast = df.cast.apply(json.loads)
df.to_json('data.json', orient='records', lines=True)
The output is a properly formatted json (extra newlines added by me)
# data.json
{"movie_id":19995,
"title":"Avatar",
"cast":[{"cast_id":242,"character":"Jake Sully","credit_id":"5602a8a7c3a3685532001c9a","gender":2,"id":65731,"name":"Sam Worthington","order":0},
{"cast_id":3,"character":"Neytiri","credit_id":"52fe48009251416c750ac9cb","gender":1,"id":8691,"name":"Zoe Saldana","order":1},
{"cast_id":25,"character":"Dr. Grace Augustine","credit_id":"52fe48009251416c750aca39","gender":1,"id":10205,"name":"Sigourney Weaver","order":2},
{"cast_id":4,"character":"Col. Quaritch","credit_id":"52fe48009251416c750ac9cf","gender":2,"id":32747,"name":"Stephen Lang","order":3}]
}
Post a Comment for "Pandas To_json() Redundant Backslashes"