• core in term

    From Nigel Reed@1:103/705 to GitLab issue in main/sbbs on Wed Sep 13 17:11:05 2023
    open https://gitlab.synchro.net/main/sbbs/-/issues/630

    Unfortunately, this happened 3 days ago and couldn't tell you what I was trying to do at the time, if anything.~~~[Thread debugging using libthread_db enabled]Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".--Type <RET> for more, q to quit, c to continue without paging--cCore was generated by `/sbbs/exec/sbbs d'.Program terminated with signal SIGSEGV, Segmentation fault.#0 getnextevent (cfg=0x7f9577b58bc8, event=0x7f94ffbfd3b0) at data.cpp:153153 if(!cfg->event[i]->node || cfg->event[i]->node>cfg->sys_nodes[Current thread is 1 (Thread 0x7f94ffbff640 (LWP 1531491))](gdb) bt#0 getnextevent (cfg=0x7f9577b58bc8, event=0x7f94ffbfd3b0) at data.cpp:153#1 0x00007f958c837076 in sbbs_t::gettimeleft (this=0x7f9577b58800, handle_out_of_time=true) at data.cpp:190#2 0x00007f958c861e5f in sbbs_t::getkey (this=0x7f9577b58800, mode=1) at getkey.cpp:110#3 0x00007f958c845fe1 in sbbs_t::exec (this=0x7f9577b58800, csi=0x7f9577b6aad8) at exec.cpp:1855#4 0x00007f958c96193c in node_thread (arg=0x7f9577b58800) at main.cpp:4305#5 0x00007f958c4b9b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442#6 0x00007f958c54ba00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81~~~
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Fri Sep 15 12:28:18 2023
    https://gitlab.synchro.net/main/sbbs/-/issues/630#note_4051

    Looking at the variables with gdb with Nigel, it was apparent that one of the cfg->event[] pointers can become NULL (not the last one, the 7th of 22). So something either corrupted the cfg struct or was in the process of freeing the configuration. We should've checked the lower numbered cfg->event[] elements, but didn't think of that.
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Nigel Reed@1:103/705 to GitLab note in main/sbbs on Mon Sep 18 22:10:01 2023
    https://gitlab.synchro.net/main/sbbs/-/issues/630#note_4187

    Got another one:~~~[Thread debugging using libthread_db enabled]Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".--Type <RET> for more, q to quit, c to continue without paging--cCore was generated by `/sbbs/exec/sbbs d'.Program terminated with signal SIGSEGV, Segmentation fault.#0 getnextevent (cfg=0x7ff15c963248, event=0x7ff0c6bfc3b0) at data.cpp:153153 if(!cfg->event[i]->node || cfg->event[i]->node>cfg->sys_nodes[Current thread is 1 (Thread 0x7ff0c6bfe640 (LWP 1908503))](gdb) bt#0 getnextevent (cfg=0x7ff15c963248, event=0x7ff0c6bfc3b0) at data.cpp:153#1 0x00007ff16df0c080 in sbbs_t::gettimeleft (this=0x7ff15c962e80, handle_out_of_time=true) at data.cpp:190#2 0x00007ff16df36e77 in sbbs_t::getkey (this=0x7ff15c962e80, mode=1) at getkey.cpp:110#3 0x00007ff16df1aff3 in sbbs_t::exec (this=0x7ff15c962e80, csi=0x7ff15c975168) at exec.cpp:1855#4 0x00007ff16e0368ee in node_thread (arg=0x7ff15c962e80) at main.cpp:4298#5 0x00007ff16db8db43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442#6 0x00007ff16dc1fa00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81~~~Pretty much the same.
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Nigel Reed@1:103/705 to GitLab note in main/sbbs on Mon Sep 18 22:15:08 2023
    https://gitlab.synchro.net/main/sbbs/-/issues/630#note_4188

    Not sure if this will help. let me know if you want me to try anything else.(gdb) print i$1 = 0(gdb) print cfg.total_events$2 = 25(gdb) print cfg-event[0]Can't do that binary op on that type(gdb) print cfg->event[0]$3 = (event_t *) 0x47004700490053(gdb) print cfg->event[1]$4 = (event_t *) 0x4300450044005f(gdb) print cfg->event[2]$5 = (event_t *) 0x4100520041004c(gdb) print cfg->event[3]$6 = (event_t *) 0x4e004f00490054(gdb) print cfg->event[4]$7 = (event_t *) 0x41004d0046004f(gdb) print cfg->event[5]$8 = (event_t *) 0x490052004f004a(gdb) print cfg->event[6]$9 = (event_t *) 0x43005f00590054(gdb) print cfg->event[7]$10 = (event_t *) 0x54004e0055004f
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Tue Sep 19 11:33:25 2023
    https://gitlab.synchro.net/main/sbbs/-/issues/630#note_4195

    In this case, we're not seeing any NULL'd cfg->event[] items. I still suspect something was in the process of freeing the cfg struct. The only thing that should do that (for a terminal node's config copy) is cleanup() in main.cpp which is only called upon shutdown or recycle, and even then, it first waits (up to 60 seconds) for all node threads to terminate. The message "Waiting for X node threads to terminate..." is logged (info-level) before this wait (and "Done waiting for node threads to terminate" is logged after the wai). Can you check and see if you have any such log messages around the time of these crashes?Lastly, just before the node's config structs are freed, the info-level message "Terminal Server thread terminating" is logged. Please check and see if that message is logged around the time of the crashes. Thanks.
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Nigel Reed@1:103/705 to GitLab note in main/sbbs on Tue Sep 19 14:31:54 2023
    https://gitlab.synchro.net/main/sbbs/-/issues/630#note_4196

    -rw------- 1 bbs bbs 1521917952 Sep 17 07:35 '/tmp/core.sbbs!termNode.1907853'~~~Sep 17 07:35:21 bbs synchronet: term Node 1 JavaScript: Creating node runtime: 134217728 bytesSep 17 07:35:21 bbs synchronet: term Node 1 07:35 Sun Sep 17 2023 Node 1Sep 17 07:35:21 bbs synchronet: term Node 1 Telnet <no name> [221.168.89.195]Sep 17 07:35:21 bbs synchronet: term Node 2 no Telnet commands received, reverting to Raw TCP modeSep 17 07:35:21 bbs synchronet: term Node 2 terminal type: 80x24 DUMBSep 17 07:35:56 bbs synchronet: term Synchronet Terminal Server Version 3.20a DebugSep 17 07:35:56 bbs synchronet: term Compiled master/f28c4bc94 Sep 17 2023 01:37:18 with GCC 11.4.0Sep 17 07:35:56 bbs synchronet: term sizeof: int=4, long=8, off_t=8, time_t=8Sep 17 07:35:56 bbs synchronet: term Initializing on Sun Sep 17 07:35:56 2023 with options: 1022Sep 17 07:35:56 bbs synchronet: term Loading configuration files from /sbbs/ctrl/Sep 17 07:35:56 bbs synchronet: term MQTT lib: mosquitto 2.0.11Sep 17 07:35:56 bbs synchronet: term MQTT connecting to broker 127.0.0.1:1883Sep 17 07:35:56 bbs synchronet: term MQTT broker-connect (127.0.0.1:1883) successfulSep 17 07:35:56 bbs synchronet: term Verifying/creating data directoriesSep 17 07:35:56 bbs synchronet: term Verifying/creating node directoriesSep 17 07:35:56 bbs synchronet: term Telnet Server listening on socket 0.0.0.0 port 23~~~There are not "Done waiting for node threads" since Sep 15.
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Nigel Reed@1:103/705 to GitLab note in main/sbbs on Tue Sep 19 18:29:59 2023
    https://gitlab.synchro.net/main/sbbs/-/issues/630#note_4200

    Here's node 3 activity~~~Sep 17 02:11:58 bbs synchronet: term Node 3 <Crashtestdummy> Scrolled 8 mouse hot-spots 1 rows (8 remain)Sep 17 02:11:58 bbs synchronet: term Node 3 <Crashtestdummy> Scrolled 8 mouse hot-spots 1 rows (8 remain)Sep 17 02:14:15 bbs synchronet: term Node 3 <Crashtestdummy> Scrolled 8 mouse hot-spots 1 rows (8 remain)Sep 17 02:14:15 bbs synchronet: term Node 3 <Crashtestdummy> Invoked string command: EVAL user.editorSep 17 02:14:15 bbs synchronet: term Node 3 <Crashtestdummy> Scrolled 8 mouse hot-spots 1 rows (8 remain)Sep 17 02:14:15 bbs synchronet: term Node 3 <Crashtestdummy> Scrolled 8 mouse hot-spots 1 rows (8 remain)ep 17 07:34:46 bbs synchronet: term Node 3 Loading configuration files from /sbbs/ctrl/Sep 17 07:34:46 bbs synchronet: term Node 3 constructor using socket 121 (settings=8212)Sep 17 07:34:46 bbs synchronet: term Node 3 temporary file directory: /sbbs/node3/temp/~~~and then just around the time of the crash:~~~Sep 17 07:35:21 bbs synchronet: term Node 2 no Telnet commands received, reverting to Raw TCP modeSep 17 07:35:21 bbs synchronet: term Node 2 terminal type: 80x24 DUMBSep 17 07:35:56 bbs synchronet: term Synchronet Terminal Server Version 3.20a DebugSep 17 07:35:56 bbs synchronet: term Compiled master/f28c4bc94 Sep 17 2023 01:37:18 with GCC 11.4.0~~~
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Nigel Reed@1:103/705 to GitLab note in main/sbbs on Sat Dec 16 11:24:53 2023
    https://gitlab.synchro.net/main/sbbs/-/issues/630#note_4563

    I have not seen any more issues this this commit so feel free to close this one out if you're happy that it's fixed.
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab issue in main/sbbs on Sat Dec 16 12:55:56 2023
    close https://gitlab.synchro.net/main/sbbs/-/issues/630
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)