[ic] Restart and stop problems (multilple processes)

Mike Heins interchange-users@icdevgroup.org
Thu Feb 6 13:49:00 2003


Quoting Daniel Hutchison (jdhutchison11@attbi.com):
> > > FWIW, I did try to look into this issue a little more in depth. However,
> > > I havn't had a whole lot of time. I havn't found anything definite yet,
> > > just some suspicions.  
> > > 
> > > >From what I can tell, the problem seems to be in the locking of the pid
> > > file.  Eg. interchange attempts to lock the pid file when it starts up. 
> > > If it can't lock the pid file, it assumes another interchange process is
> > > running.  What I suspect is that interchange locks the pid file before
> > > it forks. Since on solaris, locks created with flock() aren't inherited
> > > across forks.   As a result, when the parent process exits the pid file
> > > becomes unlocked.  When interchange is then run with the shutdown
> > > command it detects that the pid file unlocked and thinks that there
> > > isn't a running interchange process.
> > > 
> > > What I have done is verify that the default install of interchange on my
> > > solaris box uses the flock() function to lock the pid file.  I've also
> > > created a mini perl program that just locks files based off the code in
> > > interchange.  The file locking works fine until I throw a fork() in
> > > it...
> > > 
> > > Anyway, I hope this helps a bit.  
> > 
> > Turns out the files lock fine, but LOCK_NB is not working, at least on the
> > solaris server I tested on (thanks Dorothy). It doesn't work no matter the
> > state of fork. In any case, grab_pid happens in the context of the last
> > fork, as I thought.
> > 
> > I could add a -badlock option at the commandline, but it would seem
> > to make sense to just fix Perl on the affected systems. 
> 
> Looks like we were both wrong...  I was able to track the problem down
> to the read_pid() that was being called from the server_start_message()
> in lib/Vend/Server.pm.  It appears that read_pid() is releasing the lock
> when it opens and closes the pid file.
> 
> Anyway, here is a simple patch that fixes the problem on Solaris:
> 

Oh, and by the way -- thank you for the very competent debugging. 8-)

-- 
Mike Heins
Perusion -- Expert Interchange Consulting    http://www.perusion.com/
phone +1.513.523.7621      <mike@perusion.com>

Experience is what allows you to recognize a mistake the second
time you make it. -- unknown