Skip to content Skip to sidebar Skip to footer

What Does Each Item Mean In Svmlight Format

I am very confused about what each part means in a svmLight data format. For example: (label/target, [(feature, value), ...], queryid) Does the label means the rank of the data and

Solution 1:

The leading number is indeed the "target" of this object. The qid:1 part is used in constraining pairwise difference between such objects. The docid, or rather everything after the final # is an info string that

can be used to pass additional information to the kernel (e.g. non feature vector data)

(source).

The general format for each object is given in the official source, under the heading "How to use":

<line> .=. <target><feature>:<value><feature>:<value> ... <feature>:<value> # <info><target> .=. +1 | -1 | 0 | <float><feature> .=. <integer> | "qid"
<value> .=. <float><info> .=. <string>

Note that the format you specify

(label/target, [(feature, value), ...], queryid)

is that of pysvmlight, a Python binding to the SVM-Light support vector machine library made by Thorsten Joachims, which I quoted earlier. You'll need to write a parser to parse the datafiles native to svmlight into the format that pysvmlight uses. There is at least one example on StackOverflow, even though it does not take into account the qid, but it shouldn't be too difficult to add when you read that parser's code.

Post a Comment for "What Does Each Item Mean In Svmlight Format"