Skip to content Skip to sidebar Skip to footer

Using Geopy In A Dataframe To Get Distances

I am new to Geopy. I am working in this transportation company and need to get the total kilometers that a truck has operated. I have seen some answers here but they did not work f

Solution 1:

Create a point Series:

import pandas as pd

df = pd.DataFrame(
    [
        (-25.145439,  -54.294871),
        (-24.144564,  -54.240094),
        (-24.142564,  -54.198901),
        (-24.140093,  52.119021),
    ],
    columns=['latitude', 'longitude']
)

from geopy import Point
from geopy.distance import distance

df['point'] = df.apply(lambda row: Point(latitude=row['latitude'], longitude=row['longitude']), axis=1)
In[2]: dfOut[2]:
    latitudelongitudepoint0-25.145439-54.294871258m43.5804sS, 5417m41.5356sW1-24.144564-54.240094248m40.4304sS, 5414m24.3384sW2-24.142564-54.198901248m33.2304sS, 5411m56.0436sW3-24.14009352.119021248m24.3348sS, 527m8.4756sE

Add a new shifted point_next Series:

df['point_next'] = df['point'].shift(1)
df.loc[df['point_next'].isna(), 'point_next'] = None
In[4]: dfOut[4]:
    latitudelongitudepointpoint_next0-25.145439-54.294871258m43.5804sS, 5417m41.5356sWNone1-24.144564-54.240094248m40.4304sS, 5414m24.3384sW258m43.5804sS, 5417m41.5356sW2-24.142564-54.198901248m33.2304sS, 5411m56.0436sW248m40.4304sS, 5414m24.3384sW3-24.14009352.119021248m24.3348sS, 527m8.4756sE248m33.2304sS, 5411m56.0436sW

Calculate the distances:

df['distance_km'] = df.apply(lambda row: distance(row['point'], row['point_next']).km if row['point_next'] isnotNoneelsefloat('nan'), axis=1)
df = df.drop('point_next', axis=1)
In[6]: dfOut[6]:
    latitudelongitudepointdistance_km0-25.145439-54.294871258m43.5804sS, 5417m41.5356sWNaN1-24.144564-54.240094248m40.4304sS, 5414m24.3384sW111.0031722-24.142564-54.198901248m33.2304sS, 5411m56.0436sW4.1926543-24.14009352.119021248m24.3348sS, 527m8.4756sE10449.661388

Solution 2:

Be ready that .apply(geopy.distance(), axis=1) will work really slow if you are working with big amount of data (hundreds of thousands).

One workaround there is using Haversine formula, which can be effectively vectorized within pandas/numpy frame (but maybe it is less precise). Other way is using something called geopandas, if youre Ok with external packages

Post a Comment for "Using Geopy In A Dataframe To Get Distances"