Try   HackMD

交通時空大數據_應用

超速車輛

overspeed_vehicles.py
請參考 - 統計每小時的數據量

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

  • 設定超速的閾值
    ​​​​# 直接指定 ​​​​threshold_speed = 60 ​​​​# 第一四分位數 ​​​​threshold_speed = data['speed'].quantile(0.25) ​​​​# 中位數 ​​​​threshold_speed = data['speed'].median()
  • 篩選數據
    ​​​​# 計算每個小時超速的車輛數量 ​​​​data[data['speed'] > threshold_speed].groupby('Hour')['car_id'].count()
    • data[data['speed'] > threshold_speed]:選擇所有速度大於閾值的資料
    • .groupby('Hour'):根據小時 (Hour 列) 將資料分組
    • ['car_id'].count():對每個小時的資料進行計數,以得到每個小時超速的車輛數量。

完整程式碼

import os import pandas as pd import matplotlib.pyplot as plt # 指定資料路徑 input_data = 'merge0509.csv' data = pd.read_csv(input_data) # 處理時間 data['_time'] = pd.to_datetime(data['_time']) data['Hour'] = data['_time'].dt.hour # 處理速度 # 計算所有車輛速度的第一四分位數作為超速的閾值 threshold_speed = data['speed'].quantile(0.25) # 第一四分位數 # 計算每個小時超速的車輛數量 hourly_speeding_count = data[data['speed'] > threshold_speed].groupby('Hour')['car_id'].count() # 繪製圖表 plt.figure(figsize=(8, 4)) plt.plot(hourly_speeding_count.index, hourly_speeding_count.values, 'r-') plt.xlabel('Hour') plt.ylabel('Number of Speeding Vehicles') plt.title('Hourly Speeding Vehicles Count') plt.xticks(range(24)) plt.grid(True) plt.show()

出行特徵分析

提取OD數據(有整理過的數據)

將gps資料轉換成以car_id為主的資料格式
每行是gps點 -> 每行是一台車的OD資料

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

如果發現資料太多,print時會被省略,可以轉換成字串輸出:print(start_end_points.to_string())

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

  • 對每輛車的 GPS 資料進行分組,並提取每組中的第一個和最後一個 GPS 記錄,即為起點和終點

    ​​​​start_end_points = gps_data_sorted.groupby('car_id').agg({'lat': 'first', 'lon': 'first', '_time': 'first'}).reset_index().rename(columns={'lat': 'start_lat', 'lon': 'start_lon', '_time': 'start_time'}) ​​​​end_points = gps_data_sorted.groupby('car_id').agg({'lat': 'last', 'lon': 'last', '_time': 'last'}).reset_index().rename(columns={'lat': 'end_lat', 'lon': 'end_lon', '_time': 'end_time'})
    • agg():聚合函數
    • 'first': 取第一個非空值
    • 'last': 取最後一個非空值

完整程式碼

import os import pandas as pd import matplotlib.pyplot as plt # 指定資料路徑 input_data = 'C:\\NCKU\\data_analyze\\NSYU_september\\send\\merge0509.csv' gps_data = pd.read_csv(input_data) # 將資料中的 timestamp 轉換成 datetime 格式 gps_data['_time'] = pd.to_datetime(gps_data['_time']) # 對每輛車的 GPS 資料按時間排序 gps_data_sorted = gps_data.sort_values(by=['car_id', '_time']) # 對每輛車的 GPS 資料進行分組,並提取每組中的第一個和最後一個 GPS 記錄,即為起點和終點 # agg():聚合函數, 'first': 取第一個非空值, 'last': 取最後一個非空值 start_end_points = gps_data_sorted.groupby('car_id').agg({'lat': 'first', 'lon': 'first', '_time': 'first'}).reset_index().rename(columns={'lat': 'start_lat', 'lon': 'start_lon', '_time': 'start_time'}) end_points = gps_data_sorted.groupby('car_id').agg({'lat': 'last', 'lon': 'last', '_time': 'last'}).reset_index().rename(columns={'lat': 'end_lat', 'lon': 'end_lon', '_time': 'end_time'}) # 合併起點和終點資料 start_end_points['end_lat'] = end_points['end_lat'] start_end_points['end_lon'] = end_points['end_lon'] start_end_points['end_time'] = end_points['end_time'] # 顯示起點和終點資料 # print(start_end_points) print(start_end_points.to_string())

提取OD數據(未整理過的數據)

  • 計算時間差:直接將end_time - start_time

    ​​​​odData['duration'] = odData['end_time'] - odData['start_time']
    

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

    • 有些時間很長 -> 不正常(可能今天這台機車被借1次以上)
    • 需先處理、分段od資料
  • 設置時間閾值

    如果相鄰 GPS 記錄之間的時間差超過這個閾值,則視為不同的 OD 資料

    ​​​​# 設置時間閾值 ​​​​time_threshold = pd.Timedelta(minutes=30) ​​​​# 根據 'car_id' 分組,並計算每個分組中相鄰 GPS 記錄之間的時間差 ​​​​gps_data['time_diff'] = gps_data.groupby('car_id')['_time'].diff()
  • 設計判斷條件

    ​​​​od_data = [] ​​​​current_od = None ​​​​for index, row in gps_data.iterrows(): ​​​​ if pd.isnull(row['time_diff']) or row['time_diff'] > time_threshold: ​​​​ if current_od is not None: # 前面有資料,要先存起來 ​​​​ od_data.append(current_od) ​​​​ current_od = {'car_id': row['car_id'], 'start_lat': row['lat'], 'start_lon': row['lon'], 'start_time': row['_time']} ​​​​ if current_od is not None: # 更新 current_od 變數的結束資料 ​​​​ current_od['end_lat'] = row['lat'] ​​​​ current_od['end_lon'] = row['lon'] ​​​​ current_od['end_time'] = row['_time'] ​​​​ ​​​​# 將結束的 OD 資料添加到 od_data 中 ​​​​if current_od is not None: ​​​​ od_data.append(current_od)
    • 因為不知道哪裡才會出現時間斷點,所以需要逐列判斷
      for index, row in gps_data.iterrows() : 可以逐列迭代
      每次迭代會產生一對 (index, Series)

      • index 是列索引
      • Series 是包含列數據的 Pandas Series 對象
    • if pd.isnull(row['time_diff']) or row['time_diff'] > time_threshold:判斷是否為一個新的 OD 資料

      • pd.isnull(row['time_diff']) : 第一筆資料
      • row['time_diff'] > time_threshold:時間超過閾值
    • 將結束的 OD 資料添加到 od_data 中: 因最後一筆數據就不會再開一個新的OD資料,所以會沒有儲存到最後的資料

  • 將OD資料轉換成 DataFrame,並計算時間

    ​​​​od_df = pd.DataFrame(od_data) ​​​​od_df['duration'] = od_df['end_time'] - od_df['start_time']

成果比較

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

image

可以看到 31902415其實是被借了很多次,而不是一次騎了10個小時


時間統計

time_statistics.py

  • 設計x軸為借機車的時間,y軸為使用的分鐘數
    ​​​​# 將持續時間轉換為分鐘 ​​​​od_df['duration_minutes'] = od_df['duration'].dt.total_seconds() / 60 ​​​​# 用start_time的時間來統計每台車使用了多久 ​​​​od_df['Hour'] = pd.to_datetime(od_df['start_time']).dt.hour
  • 繪製箱型圖
    這邊使用sns.boxplot的函數繪製
    ​​​​import matplotlib.pyplot as plt ​​​​import seaborn as sns ​​​​ ​​​​#繪製箱型圖:sns.boxplot ​​​​fig = plt.figure(1,(6,4),dpi = 100) ​​​​ax = plt.subplot(111) ​​​​plt.sca(ax) ​​​​sns.boxplot(x="Hour", y = od_df['duration_minutes'], data=od_df,ax = ax) ​​​​plt.ylabel('Order time(minutes)') ​​​​plt.xlabel('Order start time') ​​​​plt.ylim(0,120) ​​​​plt.show()

image

時間統計-完整程式碼

import os import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # 指定資料路徑 input_data = 'merge0509.csv' gps_data = pd.read_csv(input_data) # 將資料中的 timestamp 轉換成 datetime 格式 gps_data['_time'] = pd.to_datetime(gps_data['_time']) # 對每輛車的 GPS 資料按時間排序 gps_data_sorted = gps_data.sort_values(by=['car_id', '_time']) # 設置時間閾值,如果相鄰 GPS 記錄之間的時間差超過這個閾值,則視為不同的 OD 資料 time_threshold = pd.Timedelta(minutes=15) # 根據 'car_id' 分組,並計算每個分組中相鄰 GPS 記錄之間的時間差 gps_data['time_diff'] = gps_data.groupby('car_id')['_time'].diff() # 將 'time_diff' 欄位中時間差大於閾值的記錄分成不同的 OD 資料 od_data = [] current_od = None for index, row in gps_data.iterrows(): if pd.isnull(row['time_diff']) or row['time_diff'] > time_threshold: if current_od is not None: od_data.append(current_od) current_od = {'car_id': row['car_id'], 'start_lat': row['lat'], 'start_lon': row['lon'], 'start_time': row['_time']} if current_od is not None: current_od['end_lat'] = row['lat'] current_od['end_lon'] = row['lon'] current_od['end_time'] = row['_time'] # 將結束的 OD 資料添加到 od_data 中 if current_od is not None: od_data.append(current_od) # 將 OD 資料轉換成 DataFrame od_df = pd.DataFrame(od_data) od_df['duration'] = od_df['end_time'] - od_df['start_time'] # # 顯示 OD 資料 print(od_df.to_string()) # 將持續時間轉換為分鐘 od_df['duration_minutes'] = od_df['duration'].dt.total_seconds() / 60 # 用start_time的時間來統計每台車使用了多久 od_df['Hour'] = pd.to_datetime(od_df['start_time']).dt.hour #繪製箱型圖:sns.boxplot fig = plt.figure(1,(6,4),dpi = 100) ax = plt.subplot(111) plt.sca(ax) sns.boxplot(x="Hour", y = od_df['duration_minutes'], data=od_df,ax = ax) plt.ylabel('Order time(minutes)') plt.xlabel('Order start time') plt.ylim(0,120) plt.show()

可視化

visualization.py

image

  • 因TransBigData有規定相應格式,故須建立一個新的DataFrame來使用他的繪圖
    ​​​​oddata_tbd = od_df[['car_id','start_time', 'start_lon', 'start_lat','end_time', 'end_lon', 'end_lat']].rename(columns={'car_id':'VehicleNum', 'start_time':'stime', 'start_lon':'slon', 'start_lat':'slat','end_time':'etime', 'end_lon':'elon', 'end_lat':'elat'})

可視化-完整程式碼

import pandas as pd import matplotlib.pyplot as plt import geopandas as gpd import transbigdata as tbd # 指定資料路徑 input_data = 'merge0509.csv' input_shp = 'shp\\kuohsiung.shp' gps_data = pd.read_csv(input_data) sz = gpd.read_file(input_shp) sz = sz.to_crs(epsg=4326) # accuracy = 500 # gps_data = tbd.clean_outofshape(gps_data, sz, col=['lon', 'lat'], accuracy = accuracy) # 將資料中的 timestamp 轉換成 datetime 格式 gps_data['_time'] = pd.to_datetime(gps_data['_time']) # 對每輛車的 GPS 資料按時間排序 gps_data_sorted = gps_data.sort_values(by=['car_id', '_time']) # 設置時間閾值,如果相鄰 GPS 記錄之間的時間差超過這個閾值,則視為不同的 OD 資料 time_threshold = pd.Timedelta(minutes=15) # 根據 'car_id' 分組,並計算每個分組中相鄰 GPS 記錄之間的時間差 gps_data['time_diff'] = gps_data.groupby('car_id')['_time'].diff() # 將 'time_diff' 欄位中時間差大於閾值的記錄分成不同的 OD 資料 od_data = [] current_od = None for index, row in gps_data.iterrows(): if pd.isnull(row['time_diff']) or row['time_diff'] > time_threshold: if current_od is not None: od_data.append(current_od) current_od = {'car_id': row['car_id'], 'start_lat': row['lat'], 'start_lon': row['lon'], 'start_time': row['_time']} if current_od is not None: current_od['end_lat'] = row['lat'] current_od['end_lon'] = row['lon'] current_od['end_time'] = row['_time'] # 將結束的 OD 資料添加到 od_data 中 if current_od is not None: od_data.append(current_od) # 將 OD 資料轉換成 DataFrame od_df = pd.DataFrame(od_data) od_df['duration'] = od_df['end_time'] - od_df['start_time'] # 將持續時間轉換為分鐘 od_df['duration_minutes'] = od_df['duration'].dt.total_seconds() / 60 # 用start_time的時間來統計每台車使用了多久 od_df['Hour'] = pd.to_datetime(od_df['start_time']).dt.hour # 因TransBigData有規定相應格式,故須建立一個新的DataFrame來使用他的繪圖 oddata_tbd = od_df[['car_id','start_time', 'start_lon', 'start_lat','end_time', 'end_lon', 'end_lat']].rename(columns={'car_id':'VehicleNum', 'start_time':'stime', 'start_lon':'slon', 'start_lat':'slat','end_time':'etime', 'end_lon':'elon', 'end_lat':'elat'}) #获取栅格化参数 minx, miny, maxx, maxy = sz.total_bounds bounds = [minx, miny, maxx, maxy] params = tbd.area_to_params(bounds,accuracy = 1000) #栅格化OD并集计 od_gdf = tbd.odagg_grid(oddata_tbd,params) # od_gdf.plot(column = 'count') #创建图框 fig =plt.figure(1,(8,8),dpi=100) ax =plt.subplot(111) plt.sca(ax) #添加地图底图 # tbd.plot_map(plt,bounds,zoom = 11,style = 11) sz.plot(ax = ax,edgecolor = (0,0,0,0),facecolor = (0,0,0,0.2),linewidths=0.5) #绘制colorbar cax = plt.axes([0.05, 0.33, 0.02, 0.3]) plt.title('Data count') plt.sca(ax) #绘制OD od_gdf.plot(ax = ax,column = 'count',cmap = 'Blues',linewidth = 0.5,vmax = 5,cax = cax,legend = True) #添加比例尺和指北针 tbd.plotscale(ax,bounds = bounds,textsize = 10,compasssize = 1,accuracy = 2000,rect = [0.9,0.03],zorder = 10) plt.axis('off') plt.xlim(bounds[0],bounds[2]) plt.ylim(bounds[1],bounds[3]) plt.show()

使用特徵分析

騎行距離 - 核密度分析

od_data_analysis.py

image

  • 經緯度的直線距離
import transbigdata as tbd od_df['distance'] = tbd.getdistance(od_df['start_lon'], od_df['start_lat'], od_df['end_lon'], od_df['end_lat'])

騎行次數

image

  • 統計每輛車出現幾次(有幾段OD資料)

    ​​​​od_df['car_id'].value_counts()
    

    image

  • 統計使用次數(times)

    ​​​​datatoplot = od_df['car_id'].value_counts().value_counts()
    

    image

  • 完整程式碼


每次用車時長

trip_analysis.py
核密度分析

image

  • 篩選出騎乘時間在一小時內的DataFrame
    ​​​​od_df_filtered = od_df[od_df['duration_minutes'] <= 60]
    
  • 使用sns.kdeplot()繪製核密度圖
    ​​​​sns.kdeplot(od_df_filtered['duration_minutes'])
    
  • 完整程式碼

每日用車時長

核密度分析

image

  • 把每車的每段OD資料時間加總
    ​​​​timecount = od_df.groupby('car_id')['duration_minutes'].sum()
    
  • 使用sns.kdeplot()繪製核密度圖
    ​​​​sns.kdeplot(timecount/60, label = 'Time') # x軸想以小時為單位
    

核密度完整程式碼

import os import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # 指定資料路徑 input_data = 'merge0509.csv' gps_data = pd.read_csv(input_data) # 將資料中的 timestamp 轉換成 datetime 格式 gps_data['_time'] = pd.to_datetime(gps_data['_time']) # 對每輛車的 GPS 資料按時間排序 gps_data_sorted = gps_data.sort_values(by=['car_id', '_time']) # 設置時間閾值,如果相鄰 GPS 記錄之間的時間差超過這個閾值,則視為不同的 OD 資料 time_threshold = pd.Timedelta(minutes=15) # 根據 'car_id' 分組,並計算每個分組中相鄰 GPS 記錄之間的時間差 gps_data['time_diff'] = gps_data.groupby('car_id')['_time'].diff() # 將 'time_diff' 欄位中時間差大於閾值的記錄分成不同的 OD 資料 od_data = [] current_od = None for index, row in gps_data.iterrows(): if pd.isnull(row['time_diff']) or row['time_diff'] > time_threshold: if current_od is not None: od_data.append(current_od) current_od = {'car_id': row['car_id'], 'start_lat': row['lat'], 'start_lon': row['lon'], 'start_time': row['_time']} if current_od is not None: current_od['end_lat'] = row['lat'] current_od['end_lon'] = row['lon'] current_od['end_time'] = row['_time'] # 將結束的 OD 資料添加到 od_data 中 if current_od is not None: od_data.append(current_od) # 將 OD 資料轉換成 DataFrame od_df = pd.DataFrame(od_data) od_df['duration'] = od_df['end_time'] - od_df['start_time'] # 將持續時間轉換為分鐘 od_df['duration_minutes'] = od_df['duration'].dt.total_seconds() / 60 # 用start_time的時間來統計每台車使用了多久 od_df['Hour'] = pd.to_datetime(od_df['start_time']).dt.hour # #距離分析(經緯度的直線距離) # import transbigdata as tbd # od_df['distance'] = tbd.getdistance(od_df['start_lon'], od_df['start_lat'], od_df['end_lon'], od_df['end_lat']) # print(od_df[['car_id', 'duration_minutes', 'distance']].to_string()) # #绘制距离分布的核密度分布(决定数据清洗阈值) # import numpy as np # import matplotlib.pyplot as plt # import seaborn as sns # fig = plt.figure(1,(7,7),dpi = 100) # ax1 = plt.subplot(411) # sns.kdeplot(od_df[od_df['distance']<16000]['distance']) # plt.xlim(0,15000) # plt.ylabel('Kernel Density') # ax2 = plt.subplot(412) # sns.kdeplot(od_df[od_df['distance']<6000]['distance']) # plt.xlim(0,5000) # plt.ylabel('Kernel Density') # ax3 = plt.subplot(413) # sns.kdeplot(od_df[od_df['distance']<1500]['distance']) # plt.xlim(0,1000) # plt.ylabel('Kernel Density') # ax4 = plt.subplot(414) # sns.kdeplot(od_df[od_df['distance']<750]['distance']) # plt.xlim(0,500) # plt.xlabel('distance traveled(m)') # plt.ylabel('Kernel Density') # plt.show() # #使用次数 # datatoplot = od_df['car_id'].value_counts().value_counts() # import numpy as np # import matplotlib.pyplot as plt # import seaborn as sns # fig = plt.figure(1,(7,4),dpi = 150) # ax1 = plt.subplot(111) # plt.bar(datatoplot.index,datatoplot) # plt.xticks(range(0,40,1),range(0,40,1)) # plt.xlim(0,15) # plt.xlabel('times') # plt.ylabel('frequency') # plt.show() # 用車時長 - 核密度分析 # # 每次 # import numpy as np # import matplotlib.pyplot as plt # import seaborn as sns # fig = plt.figure(1, (7, 4), dpi=150) # ax1 = plt.subplot(111) # od_df_filtered = od_df[od_df['duration_minutes'] <= 60] # # Plot KDE of trip durations in minutes # sns.kdeplot(od_df_filtered['duration_minutes']) # plt.xlabel('Time (mins)') # plt.ylabel('Kernel Density') # plt.show() # 每日 timecount = od_df.groupby('car_id')['duration_minutes'].sum() #使用时间核密度分布 import matplotlib.pyplot as plt import seaborn as sns fig = plt.figure(1,(7,4),dpi = 150) ax1 = plt.subplot(111) sns.kdeplot(timecount/60, label = 'Time') plt.legend() plt.xticks(range(25),range(25)) plt.xlim(0,10) plt.ylabel('Kernel Density') plt.xlabel('Time (h)') plt.show()

停車時長與機車利用率

停車時長短,機車利用的效率越高
utilization_analysis.py

image
image

  • 取得本次停車到下次借車的時間

    ​​​​next_rent_time = od_df.groupby('car_id').shift(-1)['start_time']
    ​​​​next_rent_time = next_rent_time.fillna(collect_end)
    ​​​​
    ​​​​# collect_end = pd.to_datetime(gps_data['_stop'])
    
    • .shift(-1) 可取得下一列資料的數據

    image

    • 如上圖,將od數據的下一個start_time減去end_time,即可取出停車的時間
    • 若沒有下次的資料了,就用GPS資料擷取的最終時間(collect_end)來計算停了多久
  • 計算停車時間

    ​​​​od_df['next_rent'] = next_rent_time
    ​​​​od_df['parking_duration'] = (od_df['next_rent'] - od_df['end_time']).dt.total_seconds()/3600
    
  • 畫圖

    ​​​​# 顯示 OD 資料 ​​​​print(od_df[['car_id', 'start_time', 'end_time', 'next_rent','parking_duration']].to_string()) ​​​​import matplotlib.pyplot as plt ​​​​import seaborn as sns ​​​​fig = plt.figure(1,(7,4),dpi = 150) ​​​​ax1 = plt.subplot(111) ​​​​sns.kdeplot(od_df['parking_duration'] ,label = 'parking') ​​​​plt.legend() ​​​​plt.xticks(range(25),range(25)) ​​​​plt.xlim(0,24) ​​​​plt.ylabel('Kernel Density') ​​​​plt.xlabel('parking (hr)') ​​​​plt.show()

完整程式碼

import pandas as pd import matplotlib.pyplot as plt import geopandas as gpd import transbigdata as tbd import seaborn as sns # 指定資料路徑 input_data = 'merge0509.csv' input_shp = 'shp\\kuohsiung.shp' gps_data = pd.read_csv(input_data) sz = gpd.read_file(input_shp) sz = sz.to_crs(epsg=4326) # 將資料中的 timestamp 轉換成 datetime 格式 gps_data['_time'] = pd.to_datetime(gps_data['_time']) collect_end = pd.to_datetime(gps_data['_stop']) # 對每輛車的 GPS 資料按時間排序 gps_data_sorted = gps_data.sort_values(by=['car_id', '_time']) # 設置時間閾值,如果相鄰 GPS 記錄之間的時間差超過這個閾值,則視為不同的 OD 資料 time_threshold = pd.Timedelta(minutes=15) # 根據 'car_id' 分組,並計算每個分組中相鄰 GPS 記錄之間的時間差 gps_data['time_diff'] = gps_data.groupby('car_id')['_time'].diff() # 將 'time_diff' 欄位中時間差大於閾值的記錄分成不同的 OD 資料 od_data = [] current_od = None for index, row in gps_data.iterrows(): if pd.isnull(row['time_diff']) or row['time_diff'] > time_threshold: if current_od is not None: od_data.append(current_od) current_od = {'car_id': row['car_id'], 'start_lat': row['lat'], 'start_lon': row['lon'], 'start_time': row['_time']} if current_od is not None: current_od['end_lat'] = row['lat'] current_od['end_lon'] = row['lon'] current_od['end_time'] = row['_time'] # 將結束的 OD 資料添加到 od_data 中 if current_od is not None: od_data.append(current_od) # 將 OD 資料轉換成 DataFrame od_df = pd.DataFrame(od_data) # 下一次借車 # .shift(-1) 取的下一列資料的數據 next_rent_time = od_df.groupby('car_id').shift(-1)['start_time'] next_rent_time = next_rent_time.fillna(collect_end) # 計算停車時間 od_df['next_rent'] = next_rent_time od_df['parking_duration'] = (od_df['next_rent'] - od_df['end_time']).dt.total_seconds()/3600 # # 顯示 OD 資料 print(od_df[['car_id', 'start_time', 'end_time', 'next_rent','parking_duration']].to_string()) import matplotlib.pyplot as plt import seaborn as sns fig = plt.figure(1,(7,4),dpi = 150) ax1 = plt.subplot(111) sns.kdeplot(od_df['parking_duration'] ,label = 'parking') plt.legend() plt.xticks(range(25),range(25)) plt.xlim(0,24) plt.ylabel('Kernel Density') plt.xlabel('parking (hr)') plt.show() #经纬度小数点保留三位小数 data_tob = od_df[['start_lon','start_lat','parking_duration']].round(3).copy() #集计每个小范围内停车时间中位数 data_tob = data_tob.groupby(['start_lon','start_lat'])['parking_duration'].quantile(0.5).reset_index() #创建图框 import matplotlib as mpl import matplotlib.pyplot as plt import transbigdata as tbd fig = plt.figure(1,(8,8),dpi = 100) ax = plt.subplot(111) #添加地图底图 minx, miny, maxx, maxy = sz.total_bounds bounds = [minx, miny, maxx, maxy] sz.plot(ax = ax, edgecolor = (0,0,0,0), facecolor = (0,0,0,0.1), linewidth = 0.5) plt.sca(ax) #定义色标colormap pallete_name = "BuPu" colors = sns.color_palette(pallete_name, 3) colors.reverse() cmap = mpl.colors.LinearSegmentedColormap.from_list(pallete_name, colors) vmax = data_tob['parking_duration'].quantile(0.95) norm = mpl.colors.Normalize(vmin=0, vmax=vmax) #绘制散点图 plt.scatter(data_tob['start_lon'],data_tob['start_lat'], s = 0.5,alpha = 1,c = data_tob['parking_duration'],cmap = cmap,norm=norm ) #添加比例尺和指北针 tbd.plotscale(ax, bounds = bounds, textsize = 10, compasssize = 1, accuracy = 2000, rect = [0.6, 0.03], zorder = 10) plt.axis('off') plt.xlim(bounds[0],bounds[2]) plt.ylim(bounds[1],bounds[3]) plt.title('Utilization efficiency') #绘制colorbar cax = plt.axes([0.1, 0.33, 0.02, 0.3]) plt.colorbar(cax=cax) plt.title('ToB (min)') plt.show()