Skip to content Skip to sidebar Skip to footer

Lowercase First Element Of Tuple In List Of Tuples

I have a list of documents, labeled with their appropriate categories: documents = [(list(corpus.words(fileid)), category) for category in corpus.categories()

Solution 1:

so your data structure is [([str], str)]. A list of tuples where each tuple is (list of strings, string). It's important to deeply understand what that means before you try to pull data out of it.

That means that for item in documents will get you a list of tuples, where item is each tuple.

That means that item[0] is the list in each tuple.

That means that for item in documents: for s in item[0]: will iterate through each string inside that list. Let's try that!

[s.lower() foritemin documents forsin item[0]]

This should give, from your example data:

[u'a', u'p', u'i', u'o', u'a', u'm', ...]

If you're trying to keep the tuple format, you could do:

[([s.lower() forsin item[0]], item[1]) foritemin documents]

# or perhaps more readably
[([s.lower() forsin lst], val) forlst, val in documents]

Both these statements give:

[([u'a', u'p', u'i', u'o', u'a', u'm', ...], 'cancer'), ... ]

Solution 2:

You are close. You are looking for a construction like this:

[([s.lower() for s inls], cat) forls, catin documents]

Which essentially puts these two together:

[[x.lower() forxin element] forelementin documents],
[(x.lower(), y) forx,y in documents]

Solution 3:

Try this:

documents = [([word.lower() forwordin corpus.words(fileid)], category)
              forcategoryin corpus.categories()
              forfileidin corpus.fileids(category)]

Solution 4:

Normally, tuples are immutable. However, since your first element of each tuple is a list, that list is mutable, so you can modify its contents without changing the tuple ownership of that list:

documents = [(...what you originally posted...) ... etc. ...]

for d in documents:
    # to lowercase all strings in the list# trailing '[:]' is important, need to modify list in place using slice
    d[0][:] = [w.lower() for w in d[0]]

    # or to just lower-case the first element of the list (which is what you asked for)
    d[0][0] = d[0][0].lower()

You can't just call lower() on a string and have it get updated - lower() returns a new string. So to modify the string to be the lowercased version, you have to assign over it. This would not be possible if the string were itself a tuple member, but since the string you are modifying is in a list in the tuple, you can modify the list contents without modifying the tuple's ownership of the list.

Post a Comment for "Lowercase First Element Of Tuple In List Of Tuples"