[CPS-devel] nxlucene persistent transport

Jean-Marc Orliaguet jmo at ita.chalmers.se
Tue May 1 16:25:41 CEST 2007


Julien Anguenot wrote:
> Jean-Marc Orliaguet wrote:
>   
>> Julien Anguenot wrote:
>>     
>>>> apart from that, the NXLucene server died a day ago, I'm not sure if
>>>> this is related to the persistent connection bug on the client, or if it
>>>> is due to a log rotation that failed, or an invalid query. I'm still
>>>> trying to figure out. But since the lucene server was restarted it has
>>>> worked without any problem.
>>>>     
>>>>         
>>> Did you get a core dump ?
>>>
>>>   
>>>       
>> no, it looks more and more like a memory leak to me. the size of the
>> twistd process increases too fast, it never stabilizes or decreases again.
>> I tried with pylucene-2.1.0, it looks much better, but still... memory
>> figures go up when re-indexing a site.
>>     
>
> yes we see the same thing here. You can switch the NXLucene server to
> DEBUG mode and see the Python GC status. The Python objects are
> constants there thus I can only suppose the memory leak is at GCJ GC
> level... Which is definitely hard to track... I would suggest you to
> restart your server once per day to flush the RAM...
>   

OK, so basically it is normal behaviour, currently the memory usage is 
only at 2% of the total RAM (230MB), and it has stabilized. GCJ 4.2.0 
definitely helped...

[...]

>> So I have just installed the ubuntu 64bit binary
>> (http://downloads.osafoundation.org/PyLucene/linux/ubuntu64/) and the
>> memory usage now stays at:
>>
>> VIRT  RES
>> 185m  68m
>>
>> even when reindexing an entire site (before it jumps at 600m !)
>>     
>
> Really amazing. Thanks for the info. Same gcc version over there ?
>
>   

in that case it is the version which is embedded with the binary 
distribution from pylucene.osafoundation.org (pylucene 2.1.0 ubuntu 64), 
i.e. the 20061121 gcj 4.2.0 snapshot, that's the only one that worked as 
a binary on RedHat/64.

>> also to bind twistd to a same cpu I start nxlucene with:
>>
>> $ taskset 1 bin/runnxlucene &
>>
>> so I believe it is one more problem solved, but all this is really edgy...
>>     
>
> Yes. GCJ use is definitely really egdy...
>
>   

also I did the mistake of starting twistd without '-n' switch, i.e. 
using a daemonized process. That gives a lot of memory leaks ...

>> By the way, Julien are you planning to port nxlucene to JBoss, with a
>> similar XML-RPC service?
>>     
>
> We just hooked up the new search engine component within Nuxeo5 which
> later on can be used as a standalone server exposing let's say a SOAP
> interface instead of an XML-RPC interface.  I would love to take time in
> the future to do this. Furthermore, instead of reimplementing my own XML
> queries and returning results as RSS streams I would implement Open
> Search nowadays (See : http://www.opensearch.org/Home)
>
> You'll have to wait a bit, though, because we got still couple of things
> to finish before starting to work out on this ;)
>
> 	J.
>   

that sounds really cool, I can help with testing, debugging, writing 
some code...

Cheers
/JM




This list archive provided by Nuxeo, the leaders of open source ECM. Check out the Nuxeo 5 open source, standards-based ECM project.