Using Geopy In A Dataframe To Get Distances
I am new to Geopy. I am working in this transportation company and need to get the total kilometers that a truck has operated. I have seen some answers here but they did not work f
Solution 1:
Create a point Series:
import pandas as pd
df = pd.DataFrame(
    [
        (-25.145439,  -54.294871),
        (-24.144564,  -54.240094),
        (-24.142564,  -54.198901),
        (-24.140093,  52.119021),
    ],
    columns=['latitude', 'longitude']
)
from geopy import Point
from geopy.distance import distance
df['point'] = df.apply(lambda row: Point(latitude=row['latitude'], longitude=row['longitude']), axis=1)
In[2]: dfOut[2]:
    latitudelongitudepoint0-25.145439-54.294871258m43.5804sS, 5417m41.5356sW1-24.144564-54.240094248m40.4304sS, 5414m24.3384sW2-24.142564-54.198901248m33.2304sS, 5411m56.0436sW3-24.14009352.119021248m24.3348sS, 527m8.4756sEAdd a new shifted point_next Series:
df['point_next'] = df['point'].shift(1)
df.loc[df['point_next'].isna(), 'point_next'] = NoneIn[4]: dfOut[4]:
    latitudelongitudepointpoint_next0-25.145439-54.294871258m43.5804sS, 5417m41.5356sWNone1-24.144564-54.240094248m40.4304sS, 5414m24.3384sW258m43.5804sS, 5417m41.5356sW2-24.142564-54.198901248m33.2304sS, 5411m56.0436sW248m40.4304sS, 5414m24.3384sW3-24.14009352.119021248m24.3348sS, 527m8.4756sE248m33.2304sS, 5411m56.0436sWCalculate the distances:
df['distance_km'] = df.apply(lambda row: distance(row['point'], row['point_next']).km if row['point_next'] isnotNoneelsefloat('nan'), axis=1)
df = df.drop('point_next', axis=1)
In[6]: dfOut[6]:
    latitudelongitudepointdistance_km0-25.145439-54.294871258m43.5804sS, 5417m41.5356sWNaN1-24.144564-54.240094248m40.4304sS, 5414m24.3384sW111.0031722-24.142564-54.198901248m33.2304sS, 5411m56.0436sW4.1926543-24.14009352.119021248m24.3348sS, 527m8.4756sE10449.661388Solution 2:
Be ready that .apply(geopy.distance(), axis=1) will work really slow if you are working with big amount of data (hundreds of thousands).
One workaround there is using Haversine formula, which can be effectively vectorized within pandas/numpy frame (but maybe it is less precise). Other way is using something called geopandas, if youre Ok with external packages
Post a Comment for "Using Geopy In A Dataframe To Get Distances"