How To Get Unique Values Set From A Repeating Values List
Solution 1:
Perl 'one-liner' intended/expanded out so that everything fits in the window:
$ perl -F-lane '
$hash{ $F[0] }{ $F[1] }++;
} END {
for my $columnA ( keys %hash ) {
print $columnA, " - ", join( ",", keys %$hash{$columnA} ), "\n";
}
'
Explanation will follow if I see a concerted attempt on the part of the original poster.
Solution 2:
I would use Python dictionaries where the dictionary keys are column A values and the dictionary values are Python's built-in Set type holding column B values
defparse_the_file():
lower = str.lower
split = str.split
withopen('f.txt') as f:
d = {}
lines = f.read().split('\n')
for A,B in [split(l) for l in lines]:
try:
d[lower(A)].add(B)
except KeyError:
d[lower(A)] = set(B)
for a in d:
print"%s - %s" % (a,",".join(list(d[a])))
if __name__ == "__main__":
parse_the_file()
The advantage of using a dictionary is that you'll have a single dictionary key per column A value. The advantage of using a set is that you'll have a unique set of column B values.
Efficiency notes:
- The use of try-catch is more efficient than using an if\else statement to check for initial cases.
- The evaluation and assignment of the str functions outside of the loop is more efficient than simply using them inside the loop.
- Depending on the proportion of new A values vs. reappearance of A values throughout the file, you may consider using
a = lower(A)
before the try catch statement - I used a function, as accessing local variables is more efficient in Python than accessing global variables
- Some of these performance tips are from here
Testing the code above on your input example yields:
xxxd - 4
xxxa - 1,3,2
xxxb - 2
xxxc - 3
Solution 3:
You can use this simple multimap:
classMultiMap(object):
values = {}
def__getitem__(self, index):
returnself.values[index]
def__setitem__(self, index, value):
ifnotself.values.has_key(index):
self.values[index] = []
self.values[index].append(value)
def__repr__(self):
return repr(self.values)
See it in action: http://codepad.org/xOOrlbnf
Solution 4:
Simple Perl version:
#!/usr/bin/perluse strict;
use warnings;
my (%v, @row);
foreach (<DATA>) {
chomp;
$_ = lc($_);
@row = split(/\s+/, $_);
push( @{ $v{$row[0]} }, $row[1]);
}
foreach (sortkeys %v) {
print"$_ - ", join( ", ", @{ $v{$_} } ), "\n";
}
__DATA__
xxxA 2
xxxA 1
xxxB 2
XXXC 3
XXXA 3
xxxD 4
Did not focus on variable names. From example i see they are not case sensitive.
Solution 5:
f = """xxxA 2
xxxA 1
xxxB 2
XXXC 3
XXXA 3
xxxD 4"""
d = {}
for line in f.split("\n"):
key, val = line.lower().split()
try:
d[key].append(val)
except KeyError:
d[key] = [val]
print d
Python
Post a Comment for "How To Get Unique Values Set From A Repeating Values List"