Pandas Scraped Data Not Working In Pandas
Solution 1:
I think you need change:
df1.WE=np.where(df3.AL.isin(df1.EW),df1.WE,np.nan)
to
df1.WE=np.where(df1.EW.isin(df2.AL),df1.WE,np.nan)
Problem is different length of DataFrame with real data. So need change data from df1
with another data - comapring return maks with same length as df1
and no error.
With your data:
df1 = pd.read_csv('df1.csv', names=['a','b','c'])
print (df1.head())
a b \
0 Ponte Preta U20 v Cruzeiro U20 2.10
1 Fluminense RJ U20 v Defensor Sporting U20 2.00
2 Gremio RS U20 v Palmeiras U20 3.30
3 Barcelona v Sporting 1.33
4 Bayern Munich v PSG 2.40
c
0 https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...
1 https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...
2 https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...
3 https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...
4 https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...
df2 = pd.read_csv('df2.csv', names=['a','b','c', 'd', 'e'])
print (df2.head())
a b c d \
0 In-Play CSKA Moscow U19 Man Utd U19 1.141 In-Play Atletico Madrid U19 Chelsea U19 1.012 In-Play Juventus U19 Olympiakos U19 1.403 Starting in22' Paris St-G U19 Bayern Munich U19 2.244 Today 21:00 Man City U19 Shakhtar U19 2.66
e
0 https://www.betfair.com.au/exchange/plus/footb... 1 https://www.betfair.com.au/exchange/plus/footb... 2 https://www.betfair.com.au/exchange/plus/footb... 3 https://www.betfair.com.au/exchange/plus/footb... 4 https://www.betfair.com.au/exchange/plus/footb...
comapre numeric columns, here b
and d
:
df1.b=np.where(df1.b.isin(df2.d),df1.b,np.nan)#first 5 values is NaNs
print (df1.head())
a b \0 Ponte Preta U20 v Cruzeiro U20 NaN1 Fluminense RJ U20 v Defensor Sporting U20 NaN2 Gremio RS U20 v Palmeiras U20 NaN3 Barcelona v Sporting NaN4 Bayern Munich v PSG NaN
c
0 https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...
1 https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...
2 https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...
3 https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...
4 https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F... #check if some not NaNs values in b columnprint (df1[df1.b.notnull()])
a b \
23 Swindon v Forest Green 1.40
50 Sportivo Barracas v Canuelas FC 13.00
80 FC Nitra 1.53
81 0-0 1.40
83 Cape Town City v Maritzburg Utd 1.53
84 Mamelodi Sundowns v Baroka FC 3.75
90 Dorking Wanderers v Tonbridge Angels 1.53
95 Coalville Town v Stamford 1.40
c
23 https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...
50 https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...
80 https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...
81 https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...
83 https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...
84 https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...
90 https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...
95 https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...
Also problem of your test data is there are same number of rows (4), so no errors.
Solution 2:
On a side note, I'd recommend using pandas functions with pandas:
df1.loc[~df1.EW.isin(df2.AL), 'WE'] = np.nan
Solution 3:
Ok, let's get back to the drawing board. The code above is cleaner, but does exactly the same you're doing with numpy. Lets split your code apart.
1) I highly recommend you to use jupyter / jupyter notebooks to play with the data and understand what is going on at each line. Take a look here, for example: https://gist.github.com/Casyfill/f432966ebabd93f4271e27a1e2e76579
So, your df1 has 100 rows and 3 columns. your df2 has 42 rows and 5 columns.
Now, you create df3
as an empty dataframe (0 rows) but 12 columns (by the way, perhaps you should use more explanatory column names). This step is totally fine, while you don't have to define all columns beforehand.
Lets go to the second line: df3['DAT'] = df2['AA']
here you basically copy the column from the second dataframe. Now, as we didn't have any rows in df3 before, it is totaly legitimate operation. By doing that, you create 42 rows in your df3. Again, this line by itself is fine.
now, last line. here the logic is the following: first, for each row in df3, we check if cell of df3.AL
(its value) is in df1.EW
column. Just note, that we never defined df3.AL before, so the whole column contains only NANs, therefore this by itself does not make any sense.
Next, let's assume there is something in df3.AL. as we check everything row-wise, we'll get a pd.Series (think - one column) of booleans as a result of this test, column with 42 rows. Now, we're trying to use this column as a "mask", which defines if df1.WE should be the same or defaulted to Nan. but you can't do that, because df1 has 100 rows, not 42!. Hense, we've got an error.
So you need to redefine what you're actually want to do here - it's not clear what you're actually need to do here.
Post a Comment for "Pandas Scraped Data Not Working In Pandas"