fast filter method in python


I want to filter two list with any fastest method in python script. I have used the built-in filter() method for this purpose. but it is quite slow and taking too much time because I have very big list, I think more than 5 million item in each list or may be more. I do not know how I will make it. Please if anybody have idea or write small function for it.

By : user17451


Maybe your lists are too large and do not fit in memory, and you experience thrashing. If the sources are in files, you do not need the whole list in memory all at once. Try using itertools, e.g.:

from itertools import ifilter

def is_important(s):
   return len(s)>10

filtered_list = ifilter(is_important, open('mylist.txt'))

Note that ifilter returns an iterator that is fast and memory efficient.

Generator Tricks is a tutorial by David M. Beazley that teaches some interesting uses for generators.

By : gimel

If you can avoid creating the lists in the first place, you'll be happier.

Rather than

aBigList = someListMakingFunction()
filter( lambda x:x>10, aBigList )

You might want to look at your function that makes the list.

def someListMakingGenerator( ):
    for x in some source:
        yield x

Then your filter doesn't involve a giant tract of memory

def myFilter( aGenerator ):
    for x in aGenerator:
        if x > 10: 
            yield x

By using generators, you don't keep much stuff in memory.

By : S.Lott

It may be useful to know that generally a conditional list comprehension is much faster than the corresponding lambda:

>>> import timeit
>>> timeit.Timer('[x for x in xrange(10) if (x**2 % 4) == 1]').timeit()
>>> timeit.f = lambda x: (x**2 % 4) == 1
timeit.Timer('[x for x in xrange(10) if f(x)]').timeit()

(Not sure why I needed to put f in the timeit namespace, there. Haven't really used the module much.)

This video can help you solving your question :)
By: admin