Pandas 使用教程 JSON

技术分享 3年前 (2023-08-29) 0 999+

Pandas 可以很方便的处理 JSON 数据

demo.json

[     {         "name":"张三",         "age":23,         "gender":true     },     {         "name":"李四",         "age":24,         "gender":true     },     {         "name":"王五",         "age":25,         "gender":false     } ]

JSON 转换为 CSV

非常方便，只要通过 pd.read_json 读出JSON数据，再通过 df.to_csv 写入 CSV 即可

import pandas as pd  json_path = 'data/demo.json'  # 加载 JSON 数据 with open(json_path, 'r', encoding='utf8') as f:     # 解析一个有效的JSON字符串并将其转换为Python字典     df = pd.read_json(f.read())     print(df.to_string())  # to_string() 用于返回 DataFrame 类型的数据，我们也可以直接处理 JSON 字符串。     print('-' * 10)      # 重新定义标题     df.columns = ['姓名', '年龄', '性别']     print(df)      df.to_csv('data/result.csv', index=False, encoding='GB2312')

简单 JSON

从 URL 中读取 JSON 数据：

import pandas as pd  URL = 'https://static.runoob.com/download/sites.json' df = pd.read_json(URL) # 和读文件一样 print(df)

输出：

     id    name             url  likes 0  A001    菜鸟教程  www.runoob.com     61 1  A002  Google  www.google.com    124 2  A003      淘宝  www.taobao.com     45

字典转化为 DataFrame 数据

import pandas as pd  s = {     "col1": {"row1": 1, "row2": 2, "row3": 3},     "col2": {"row1": "x", "row2": "y", "row4": "z"} }  df = pd.DataFrame(s) print(df) print('-' * 10)  new_df = df.dropna()  # 数据清洗，删除包含空数据的行 print(new_df.to_string()) print('-' * 10)  df.fillna(99, inplace=True)  # fillna() 方法来替换一些空字段 print(df.to_string())

输出：不同的行会用 NaN 填充

      col1 col2 row1   1.0    x row2   2.0    y row3   3.0  NaN row4   NaN    z ----------       col1 col2 row1   1.0    x row2   2.0    y ----------       col1 col2 row1   1.0    x row2   2.0    y row3   3.0   99 row4  99.0    z

内嵌的 JSON 数据

nested_list.json 嵌套的JSON数据

{   "school_name": "ABC primary school",   "class": "Year 1",   "students": [     {       "id": "A001",       "name": "Tom",       "math": 60,       "physics": 66,       "chemistry": 61     },     {       "id": "A002",       "name": "James",       "math": 89,       "physics": 76,       "chemistry": 51     },     {       "id": "A003",       "name": "Jenny",       "math": 79,       "physics": 90,       "chemistry": 78     }   ] }

运行代码
data = json.loads(f.read()) 使用 Python JSON 模块载入数据。
json_normalize() 使用了参数 record_path 并设置为 ['students'] 用于展开内嵌的 JSON 数据 students。

import pandas as pd import json  # 打印出结果JSON结构 with open('data/nested_list.json', 'r') as f:     data = pd.read_json(f.read())     print(data)  # 使用 Python JSON 模块载入数据 with open('data/nested_list.json', 'r') as f:     data = json.loads(f.read())  # 展平数据-- json_normalize() 方法将内嵌的数据完整的解析出来： df_nested_list = pd.json_normalize(data, record_path=['students']) print(df_nested_list)

import pandas as pd import json  data_path = 'data/nested_list.json'  print(('-' * 10) + ' 连同上级JSON值一起显示') # 使用 Python JSON 模块载入数据 with open(data_path, 'r') as f:     data = json.loads(f.read())  # 展平数据 df_nested_list = pd.json_normalize(     data,     record_path=['students'],     meta=['school_name', 'class'] ) print(df_nested_list)

复杂 JSON

该数据嵌套了列表和字典，数据文件 nested_mix.json 如下
nested_mix.json

{     "school_name": "local primary school",     "class": "Year 1",     "info": {       "president": "John Kasich",       "address": "ABC road, London, UK",       "contacts": {         "email": "admin@e.com",         "tel": "123456789"       }     },     "students": [     {         "id": "A001",         "name": "Tom",         "math": 60,         "physics": 66,         "chemistry": 61     },     {         "id": "A002",         "name": "James",         "math": 89,         "physics": 76,         "chemistry": 51     },     {         "id": "A003",         "name": "Jenny",         "math": 79,         "physics": 90,         "chemistry": 78     }] }

import pandas as pd import json  # 使用 Python JSON 模块载入数据 with open('data/nested_mix.json', 'r') as f:     data = json.loads(f.read())  df = pd.json_normalize(     data,     record_path=['students'],     meta=[         'class',         ['info', 'president'],  # 类似 info.president         ['info', 'contacts', 'tel']     ] )  print(df)

     id   name  math  ...   class  info.president info.contacts.tel 0  A001    Tom    60  ...  Year 1     John Kasich         123456789 1  A002  James    89  ...  Year 1     John Kasich         123456789 2  A003  Jenny    79  ...  Year 1     John Kasich         123456789  [3 rows x 8 columns]

读取内嵌数据中的一组数据
nested_deep.json

{     "school_name": "local primary school",     "class": "Year 1",     "students": [     {         "id": "A001",         "name": "Tom",         "grade": {             "math": 60,             "physics": 66,             "chemistry": 61         }       },     {         "id": "A002",         "name": "James",         "grade": {             "math": 89,             "physics": 76,             "chemistry": 51         }             },     {         "id": "A003",         "name": "Jenny",         "grade": {             "math": 79,             "physics": 90,             "chemistry": 78         }     }] }

这里我们需要使用到 glom 模块来处理数据套嵌，glom 模块允许我们使用 . 来访问内嵌对象的属性。

第一次使用我们需要安装 glom：
pip3 install glom -i https://pypi.tuna.tsinghua.edu.cn/simple

import pandas as pd from glom import glom  df = pd.read_json('nested_deep.json')  data = df['students'].apply(lambda row: glom(row, 'grade.math')) print(data)

输出：