drop gold server crash?

All your MAngband related technical questions answered. Problems compiling or running the game/server? No problem! Ask here.
Post Reply
silicontrip
Human Zombie
Posts: 28
Joined: Sun 12.12.2010, 23:26
Location: Melbourne, Australia
Contact:

drop gold server crash?

Post by silicontrip » Sun 12.12.2010, 23:37

I'm running a private mangband server version 1120, which I built from a tarball. Running on an amd64 linux system.
With OSX clients downloaded from the official page. Sorry I don't know their version. Both intel and ppc versions.

Whenever I drop some gold, the server crashes.
I found a topic mentioning that this was fixed in an earlier version.

Is it a client or server issue?
Could someone help me resolve it?

Thanks
Mark
--
Project Mangband OSX Carbon Developer.

silicontrip
Human Zombie
Posts: 28
Joined: Sun 12.12.2010, 23:26
Location: Melbourne, Australia
Contact:

Re: drop gold server crash?

Post by silicontrip » Wed 22.12.2010, 05:01

I was able to get this GDB output;

Code: Select all

 Program received signal SIGSEGV, Segmentation fault.
0x0000000000468472 in do_cmd_drop_gold (Ind=0, amt=1) at server/cmd3.c:669
669             if ((p_ptr->lev == 1) && (cfg_newbies_cannot_drop))
(gdb) bt
#0  0x0000000000468472 in do_cmd_drop_gold (Ind=0, amt=1) at server/cmd3.c:669
#1  0x00000000004adc8e in Receive_drop_gold (ind=0) at server/netserver.c:5029
#2  0x00000000004a5ba4 in process_pending_commands (ind=0)
    at server/netserver.c:1722
#3  0x00000000004a5e4f in Handle_input (fd=7, arg=0) at server/netserver.c:1874
#4  0x0000000000485bc1 in sched () at server/sched.c:485
#5  0x000000000049959e in play_game (new_game=0 '\000')
    at server/dungeon.c:2493
#6  0x00000000004af2a3 in main (argc=0, argv=0x7fffffffe620)
    at server/main.c:563
Do any of the developers read this list?
Mark
--
Project Mangband OSX Carbon Developer.

silicontrip
Human Zombie
Posts: 28
Joined: Sun 12.12.2010, 23:26
Location: Melbourne, Australia
Contact:

Re: drop gold server crash?

Post by silicontrip » Wed 22.12.2010, 06:17

I've traced the error down to the Packet_scanf function is overwriting the player variable. Setting player to 0 and causing a server crash.

I don't know enough about variable arguments commands to understand how Packet_scanf working and how it's overwriting the incorrect variable.

I've put a work around in by copying the player variable to another and then copying it back before calling do_cmd_drop_gold but I don't know what else is being overwritten.

I also assume that this is an amd64 bug, as I compiled it on my MacBook Pro and it did not crash. (or it could be to do with the variable arguments STDVA)

any suggestions?
--
Project Mangband OSX Carbon Developer.

User avatar
Flambard
King Vampire
Posts: 258
Joined: Wed 20.06.2007, 10:49

Re: drop gold server crash?

Post by Flambard » Tue 28.12.2010, 11:44

Jeez, I can't believe gold drop bug is still alive. It might still be related to wrong primitive type being used during encoding/decoding of network packets (Packet_scanf)? I'll see if it's reproducable on amd64 linux.

PowerWyrm
Balrog
Posts: 1574
Joined: Sun 27.11.2005, 15:57

Re: drop gold server crash?

Post by PowerWyrm » Fri 31.12.2010, 17:09

Packet_printf/scanf are a mess. There's a ticket about them in the bug database (#822). Problems mentioned on that ticket could lead to memory incorrectly overwritten and crashes.

silicontrip
Human Zombie
Posts: 28
Joined: Sun 12.12.2010, 23:26
Location: Melbourne, Australia
Contact:

Re: drop gold server crash?

Post by silicontrip » Tue 08.02.2011, 02:41

The right primative is used. I did look at bug #855 and checked if this was the case with the netserver code, however it's not.

Somehow (which I haven't worked out yet) Packet_scanf is over writing 2 ints. It's correctly writing to the first but then also writes 0 to the next 4 bytes. Leading to the crash.

The compiler puts player immediately after amt in memory. Thus player is being overwritten. By putting another int after amt, I've worked around this bug.

I have also checked the addresses with the intel OSX compiler and player is placed after amt but player is not over written in this case. Looks like it's a quirk in Packet_scanf with amd64 gcc.

This show the addresses of all the variables just before the scanf in drop_gold. You can see how my 3 padding variables sit around the amt variable. With the values ffffffff

Code: Select all

DEBUG: drop gold  ch aaa8068f amt aaa80678 player aaa80688 n aaa80684 pad aaa80674 aaa8067c aaa80680
DEBUG: drop gold pad ffffffff ffffffff ffffffff
The following are immediately after the va_arg calls in Packet_scanf, as you can see the address is the same as ch and amt. It does not write to aaa8067c.

Code: Select all

char@aaa8068f
long@aaa80678
Here we can see the second padding variable @aaa8067c has been overwritten.

Code: Select all

DEBUG: drop gold pad ffffffff 0 ffffffff
080211 131708 Player: You drop 1 pieces of gold.
This might be happening at other places in the code but it's just lucky that the overwritten variable isn't useful.
Mark
--
Project Mangband OSX Carbon Developer.

silicontrip
Human Zombie
Posts: 28
Joined: Sun 12.12.2010, 23:26
Location: Melbourne, Australia
Contact:

Re: drop gold server crash?

Post by silicontrip » Tue 08.02.2011, 03:21

On an amd64 linux machine, longs are 8 bytes, not 4.
An s32b does not match the %ld format string. This is more like bug #822 than I first thought.

I'm going to do a code audit for %ld using 4 byte storage types.
--
Project Mangband OSX Carbon Developer.

Post Reply