Skip to content Skip to sidebar Skip to footer

Create Sparse Matrix With Pandas And Fill It With Values From One Column Of .dat File At Indexes [x,y] From Other Two Columns Of .dat File

I have a .dat file, that contains three columns - userID, artistID and weight. Using Python, I read the data into pandas Dataframe with data = pd.read_table('train.dat'). I want to

Solution 1:

With your data copied to file:

In [290]: data = pd.read_csv('stack48133358.txt',delim_whitespace=True)
In [291]: data
Out[291]: 
   userID  artistID    weight
04570.71147812041440.4640002366502.42328931401461.0146704170311.41247852404680.652999
In [292]: M = sparse.csr_matrix((data.weight, (data.userID, data.artistID)))
In [293]: M
Out[293]: 
<241x651 sparse matrix of type '<class 'numpy.float64'>'
    with 6 stored elements in Compressed Sparse Row format>
In [294]: print(M)
  (36, 650)     2.42328874902
  (45, 7)       0.711477987421
  (140, 146)    1.01466992665
  (170, 31)     1.41247833622
  (204, 144)    0.464
  (240, 468)    0.652999240699

I can also load that file with genfromtxt:

In [307]: data = np.genfromtxt('stack48133358.txt',dtype=None, names=True)
In [308]: data
Out[308]: 
array([( 45,   7,  0.71147799), (204, 144,  0.464     ),
       ( 36, 650,  2.42328875), (140, 146,  1.01466993),
       (170,  31,  1.41247834), (240, 468,  0.65299924)],
      dtype=[('userID', '<i4'), ('artistID', '<i4'), ('weight', '<f8')])
In [309]: M = sparse.csr_matrix((data['weight'], (data['userID'], data['artistID
     ...: '])))
In [310]: M
Out[310]: 
<241x651 sparse matrix of type '<class 'numpy.float64'>'with6 stored elements in Compressed Sparse Row format>

Post a Comment for "Create Sparse Matrix With Pandas And Fill It With Values From One Column Of .dat File At Indexes [x,y] From Other Two Columns Of .dat File"