Asset and Grid Robust server just exits.

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Asset and Grid Robust server just exits.

Mike Dickson
Seems based on load.  They'll run overnight and late morning one will crash
w/the other usually a minute or 2 behind it.   I've investigated lots of
possibilities.  Plenty of file handles available. It's not ephemeral port
exhaustion.  I have systemd set up to auto restart the processes.  Systemd
reports an exit status of 1.  But the pid file isn't cleaned up and there
are no crashes.  Happens regardless of the mono version I run.

 

Basically I have a main grid server with all of the services configured.
The grid is HG enabled. This has been running fine for a while but I
recently added a bunch of regions and load so I added a secondary asset
(port 8004) and inventory (8005).  The original service still has all the
normal services running on 8002/8003 public/private ports.  The regions are
getting their assets/inventory from the private services in separate
processes (the 8004/5) versions.  And FWIW the inventory service is rock
solid. It never crashes.  So this is somehow related to assets.  I assume
the HG connects go through the HG connector and get serviced I the grid
server.

 

Does this ring a bell to anyone. I'm running out of things to try.

 

Mike

 

_______________________________________________
Opensim-dev mailing list
[hidden email]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
Reply | Threaded
Open this post in threaded view
|

Re: Asset and Grid Robust server just exits.

Kevin Cozens
On 2018-11-15 11:51 a.m., Mike Dickson wrote:
> They'll run overnight and late morning one will crash
[snip]
> The grid is HG enabled. This has been running fine for a while but I
> recently added a bunch of regions

Check for error messages in the log files of the regions that crash. The
other thing to check is whether you might be running out of system memory.

--
Cheers!

Kevin.

http://www.ve3syb.ca/               | "Nerds make the shiny things that
https://www.patreon.com/KevinCozens | distract the mouth-breathers, and
                                     | that's why we're powerful"
Owner of Elecraft K2 #2172          |
#include <disclaimer/favourite>     |             --Chris Hardwick
_______________________________________________
Opensim-dev mailing list
[hidden email]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
Reply | Threaded
Open this post in threaded view
|

Re: Asset and Grid Robust server just exits.

Mike Dickson
No regions are crashing.  Everything else has run for days with no issues at
all.  The only thing that crashes is the processes running assets.  The main
reason I added an additional one was to be able to prove out being able to
scale Assets to more than one server. I'd eventually like to add a redundant
server to the mix.  So system memory is fine. Lots free and about 12gb in
system cache.  The system has never paged.

The only thing that dies is the asset servers (2 processes, the original one
with all the grid services in it and a second with just assets).  FWIW the
split Inventory code in that instance NEVER dies.  Its run fine since I
broke them out.  So same code/mono, etc.  

And to be complete NEVER a crash file.  Mono doesn't crash. There is no
backtrace.  Systemd is indicating the process exited with an exit code of 1
and the PID file is still there.

Thanks for answering and offering some help.  If I can't identify another
cause I'll likely fall back to a single server and revisit this in a test
setup.

Mike


-----Original Message-----
From: [hidden email]
<[hidden email]> On Behalf Of Kevin Cozens
Sent: Thursday, November 15, 2018 5:21 PM
To: [hidden email]
Subject: Re: [Opensim-dev] Asset and Grid Robust server just exits.

On 2018-11-15 11:51 a.m., Mike Dickson wrote:
> They'll run overnight and late morning one will crash
[snip]
> The grid is HG enabled. This has been running fine for a while but I
> recently added a bunch of regions

Check for error messages in the log files of the regions that crash. The
other thing to check is whether you might be running out of system memory.

--
Cheers!

Kevin.

http://www.ve3syb.ca/               | "Nerds make the shiny things that
https://www.patreon.com/KevinCozens | distract the mouth-breathers, and
                                     | that's why we're powerful"
Owner of Elecraft K2 #2172          |
#include <disclaimer/favourite>     |             --Chris Hardwick
_______________________________________________
Opensim-dev mailing list
[hidden email]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev

_______________________________________________
Opensim-dev mailing list
[hidden email]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
Reply | Threaded
Open this post in threaded view
|

Re: Asset and Grid Robust server just exits.

Kevin Cozens
On 2018-11-15 5:44 p.m., Mike Dickson wrote:
> No regions are crashing.

II'm usually helping people who are having problems with regions so I typed
that by mistake. I was suggesting you check the log files for Asset and
Robust instances.

--
Cheers!

Kevin.

http://www.ve3syb.ca/               | "Nerds make the shiny things that
https://www.patreon.com/KevinCozens | distract the mouth-breathers, and
                                     | that's why we're powerful"
Owner of Elecraft K2 #2172          |
#include <disclaimer/favourite>     |             --Chris Hardwick
_______________________________________________
Opensim-dev mailing list
[hidden email]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
Reply | Threaded
Open this post in threaded view
|

Re: Asset and Grid Robust server just exits.

Haravikk
In reply to this post by Mike Dickson


> On 15 Nov 2018, at 22:44, Mike Dickson <[hidden email]> wrote:
>
> No regions are crashing.  Everything else has run for days with no issues at
> all.  The only thing that crashes is the processes running assets.  The main
> reason I added an additional one was to be able to prove out being able to
> scale Assets to more than one server. I'd eventually like to add a redundant
> server to the mix.  So system memory is fine. Lots free and about 12gb in
> system cache.  The system has never paged.
>
> The only thing that dies is the asset servers (2 processes, the original one
> with all the grid services in it and a second with just assets).  FWIW the
> split Inventory code in that instance NEVER dies.  Its run fine since I
> broke them out.  So same code/mono, etc.  
>
> And to be complete NEVER a crash file.  Mono doesn't crash. There is no
> backtrace.  Systemd is indicating the process exited with an exit code of 1
> and the PID file is still there.
>
> Thanks for answering and offering some help.  If I can't identify another
> cause I'll likely fall back to a single server and revisit this in a test
> setup.
>
> Mike

I'm no expert on running multiple asset servers, but in terms of figuring out what the issue might be if you can't find anything in your logs then perhaps it would be useful if you could share some of your configuration for the servers themselves? Stripped of anything sensitive of course, and focused on just the asset server specific parts, but it might help someone identify a misconfiguration.

I just wonder if maybe the servers are somehow configured in a way that they're conflicting rather than cooperating? Like I say, I'm unfamiliar with running your type of setup but usually the more information you can give the more chance there is people can come up with possible fixes.

One other thing I'm unclear on, but are you talking about running two separate asset server instances on a single system? I know your intention is to use separate systems in future but if you're just testing on the same machine at the moment then I wonder if there might be some other shared resource that's causing the problem, such a lock-file or something else that asset servers might store locally and both be trying to access at the same time (not sure what I'm afraid, as I only run one myself).
_______________________________________________
Opensim-dev mailing list
[hidden email]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
Reply | Threaded
Open this post in threaded view
|

Re: Asset and Grid Robust server just exits.

Mike Dickson
In reply to this post by Kevin Cozens
Yes absolutely checked the logs.  That's interesting as well. No errors or
even shutdown messages. Just the startup sequence on the restart and as I
mentioned systemd Is reporting an exit code of 1 and the pid file remains.

-----Original Message-----
From: [hidden email]
<[hidden email]> On Behalf Of Kevin Cozens
Sent: Friday, November 16, 2018 1:58 AM
To: [hidden email]
Subject: Re: [Opensim-dev] Asset and Grid Robust server just exits.

On 2018-11-15 5:44 p.m., Mike Dickson wrote:
> No regions are crashing.

II'm usually helping people who are having problems with regions so I typed
that by mistake. I was suggesting you check the log files for Asset and
Robust instances.

--
Cheers!

Kevin.

http://www.ve3syb.ca/               | "Nerds make the shiny things that
https://www.patreon.com/KevinCozens | distract the mouth-breathers, and
                                     | that's why we're powerful"
Owner of Elecraft K2 #2172          |
#include <disclaimer/favourite>     |             --Chris Hardwick
_______________________________________________
Opensim-dev mailing list
[hidden email]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev

_______________________________________________
Opensim-dev mailing list
[hidden email]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
Reply | Threaded
Open this post in threaded view
|

Re: Asset and Grid Robust server just exits.

Cinder Roxley
In reply to this post by Mike Dickson
On November 15, 2018 at 10:52:02 AM, Mike Dickson (
[hidden email]) wrote:

Seems based on load. They'll run overnight and late morning one will crash
w/the other usually a minute or 2 behind it. I've investigated lots of
possibilities. Plenty of file handles available. It's not ephemeral port
exhaustion. I have systemd set up to auto restart the processes. Systemd
reports an exit status of 1. But the pid file isn't cleaned up and there
are no crashes. Happens regardless of the mono version I run.

Which asset service are you running? There are three, iirc. FSAssets has
particularly nasty resource leaks. These lead to memory exhaustion and
eventually the OOM killer puts a bullet in its head. Check if this is the
case via /var/log/messages. This is also why I had suggested trying my
alternative asset service written in Go which uses a fraction of the memory
and scales out quite nicely.
_______________________________________________
Opensim-dev mailing list
[hidden email]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
Reply | Threaded
Open this post in threaded view
|

Re: Asset and Grid Robust server just exits.

Mike Dickson
I'm running fsassets yes.  And yes thats the behavior I think I'm
seeing.  Though its not exhausting physical memory. Just virtual.

Cinder I pulled down your repo and built it. Checked the help text.  It
looks like it should work over the top of the fsassets data I already
have?  I may set up a test environment first but if thats the case I'm
happy to give it a go (pun intended).

Mike

On 2018-11-16 02:45, Cinder Roxley wrote:

> On November 15, 2018 at 10:52:02 AM, Mike Dickson (
> [hidden email]) wrote:
>
> Seems based on load. They'll run overnight and late morning one will
> crash
> w/the other usually a minute or 2 behind it. I've investigated lots of
> possibilities. Plenty of file handles available. It's not ephemeral
> port
> exhaustion. I have systemd set up to auto restart the processes.
> Systemd
> reports an exit status of 1. But the pid file isn't cleaned up and
> there
> are no crashes. Happens regardless of the mono version I run.
>
> Which asset service are you running? There are three, iirc. FSAssets
> has
> particularly nasty resource leaks. These lead to memory exhaustion and
> eventually the OOM killer puts a bullet in its head. Check if this is
> the
> case via /var/log/messages. This is also why I had suggested trying my
> alternative asset service written in Go which uses a fraction of the
> memory
> and scales out quite nicely.
> _______________________________________________
> Opensim-dev mailing list
> [hidden email]
> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
_______________________________________________
Opensim-dev mailing list
[hidden email]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
Reply | Threaded
Open this post in threaded view
|

Re: Asset and Grid Robust server just exits.

Cinder Roxley
Yes, it’s meant to run as a drop-in replacement for fsassets service to
avoid hellish migration times. I have a version utilizing ceph for
distributed storage, but the version on GitHub is just meant as an fsasset
replacement for precisely these issues.


On November 16, 2018 at 9:54:34 AM, [hidden email] (
[hidden email]) wrote:

I'm running fsassets yes. And yes thats the behavior I think I'm
seeing. Though its not exhausting physical memory. Just virtual.

Cinder I pulled down your repo and built it. Checked the help text. It
looks like it should work over the top of the fsassets data I already
have? I may set up a test environment first but if thats the case I'm
happy to give it a go (pun intended).

Mike

On 2018-11-16 02:45, Cinder Roxley wrote:

> On November 15, 2018 at 10:52:02 AM, Mike Dickson (
> [hidden email]) wrote:
>
> Seems based on load. They'll run overnight and late morning one will
> crash
> w/the other usually a minute or 2 behind it. I've investigated lots of
> possibilities. Plenty of file handles available. It's not ephemeral
> port
> exhaustion. I have systemd set up to auto restart the processes.
> Systemd
> reports an exit status of 1. But the pid file isn't cleaned up and
> there
> are no crashes. Happens regardless of the mono version I run.
>
> Which asset service are you running? There are three, iirc. FSAssets
> has
> particularly nasty resource leaks. These lead to memory exhaustion and
> eventually the OOM killer puts a bullet in its head. Check if this is
> the
> case via /var/log/messages. This is also why I had suggested trying my
> alternative asset service written in Go which uses a fraction of the
> memory
> and scales out quite nicely.
> _______________________________________________
> Opensim-dev mailing list
> [hidden email]
> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
_______________________________________________
Opensim-dev mailing list
[hidden email]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
_______________________________________________
Opensim-dev mailing list
[hidden email]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
Reply | Threaded
Open this post in threaded view
|

Re: Asset and Grid Robust server just exits.

Michel Beauregard
Hi,Just as a complement of information.
 I am running fsasset since osgrid asset service  crash on raid failure  in 2015 . It was a bit tricky to get it to run as a separate service with its own robust instance because of Hypergrid consideration in asset access. It did run under ubuntu 15.04 with robust version 0821 and still running under 16.04 with robust version 090 .

I also implemented FSasset in  Lisa Baxton IMA metaverse depot under ubuntu server 16.04.4 with  robust / opensim version 0821. There is an issue with FSAsset database handling on that version. Reference to that mantis is http://opensimulator.org/mantis/view.php?id=7794 . The patch included in that link need to be applied to release version 0821 to correct the problem. If you need a tested compiled version of that correction don’t hesitate to contact me.

Hope that is helpful


GiMiSa

    Le vendredi 16 novembre 2018 11 h 02 min 06 s HNE, Cinder Roxley <[hidden email]> a écrit :  
 
 Yes, it’s meant to run as a drop-in replacement for fsassets service to
avoid hellish migration times. I have a version utilizing ceph for
distributed storage, but the version on GitHub is just meant as an fsasset
replacement for precisely these issues.


 
_______________________________________________
Opensim-dev mailing list
[hidden email]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
Reply | Threaded
Open this post in threaded view
|

Re: Asset and Grid Robust server just exits.

Mike Dickson
In reply to this post by Cinder Roxley
Cinder do you by chance know what the heck goes in the [AssetService]
section (and in Services for that matter) when you're configuring to run
the AssetService on a seperate port?  I have things working for local
region access because they can get to the private asset specific port
but the HGAssetService doesn't know where to find it so suitcase
operations are failing.   And all of this is just not documented as far
as I can tell.

Otherwise the Go based AssetService looks like an awesome solution!

Mike


On 2018-11-16 06:01, Cinder Roxley wrote:

> Yes, it’s meant to run as a drop-in replacement for fsassets service to
> avoid hellish migration times. I have a version utilizing ceph for
> distributed storage, but the version on GitHub is just meant as an
> fsasset
> replacement for precisely these issues.
>
>
> On November 16, 2018 at 9:54:34 AM, [hidden email] (
> [hidden email]) wrote:
>
> I'm running fsassets yes. And yes thats the behavior I think I'm
> seeing. Though its not exhausting physical memory. Just virtual.
>
> Cinder I pulled down your repo and built it. Checked the help text. It
> looks like it should work over the top of the fsassets data I already
> have? I may set up a test environment first but if thats the case I'm
> happy to give it a go (pun intended).
>
> Mike
>
> On 2018-11-16 02:45, Cinder Roxley wrote:
>> On November 15, 2018 at 10:52:02 AM, Mike Dickson (
>> [hidden email]) wrote:
>>
>> Seems based on load. They'll run overnight and late morning one will
>> crash
>> w/the other usually a minute or 2 behind it. I've investigated lots of
>> possibilities. Plenty of file handles available. It's not ephemeral
>> port
>> exhaustion. I have systemd set up to auto restart the processes.
>> Systemd
>> reports an exit status of 1. But the pid file isn't cleaned up and
>> there
>> are no crashes. Happens regardless of the mono version I run.
>>
>> Which asset service are you running? There are three, iirc. FSAssets
>> has
>> particularly nasty resource leaks. These lead to memory exhaustion and
>> eventually the OOM killer puts a bullet in its head. Check if this is
>> the
>> case via /var/log/messages. This is also why I had suggested trying my
>> alternative asset service written in Go which uses a fraction of the
>> memory
>> and scales out quite nicely.
>> _______________________________________________
>> Opensim-dev mailing list
>> [hidden email]
>> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
> _______________________________________________
> Opensim-dev mailing list
> [hidden email]
> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
> _______________________________________________
> Opensim-dev mailing list
> [hidden email]
> http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev
_______________________________________________
Opensim-dev mailing list
[hidden email]
http://opensimulator.org/cgi-bin/mailman/listinfo/opensim-dev