Create Sparse Matrix With Pandas And Fill It With Values From One Column Of .dat File At Indexes [x,y] From Other Two Columns Of .dat File
I have a .dat file, that contains three columns - userID, artistID and weight. Using Python, I read the data into pandas Dataframe with data = pd.read_table('train.dat'). I want to
Solution 1:
With your data copied to file:
In [290]: data = pd.read_csv('stack48133358.txt',delim_whitespace=True)
In [291]: data
Out[291]:
userID artistID weight
04570.71147812041440.4640002366502.42328931401461.0146704170311.41247852404680.652999
In [292]: M = sparse.csr_matrix((data.weight, (data.userID, data.artistID)))
In [293]: M
Out[293]:
<241x651 sparse matrix of type '<class 'numpy.float64'>'
with 6 stored elements in Compressed Sparse Row format>
In [294]: print(M)
(36, 650) 2.42328874902
(45, 7) 0.711477987421
(140, 146) 1.01466992665
(170, 31) 1.41247833622
(204, 144) 0.464
(240, 468) 0.652999240699
I can also load that file with genfromtxt
:
In [307]: data = np.genfromtxt('stack48133358.txt',dtype=None, names=True)
In [308]: data
Out[308]:
array([( 45, 7, 0.71147799), (204, 144, 0.464 ),
( 36, 650, 2.42328875), (140, 146, 1.01466993),
(170, 31, 1.41247834), (240, 468, 0.65299924)],
dtype=[('userID', '<i4'), ('artistID', '<i4'), ('weight', '<f8')])
In [309]: M = sparse.csr_matrix((data['weight'], (data['userID'], data['artistID
...: '])))
In [310]: M
Out[310]:
<241x651 sparse matrix of type '<class 'numpy.float64'>'with6 stored elements in Compressed Sparse Row format>
Post a Comment for "Create Sparse Matrix With Pandas And Fill It With Values From One Column Of .dat File At Indexes [x,y] From Other Two Columns Of .dat File"