From owner-state-threads@oss.sgi.com Tue Oct 2 11:32:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f92IWss21902 for state-threads-outgoing; Tue, 2 Oct 2001 11:32:54 -0700 Received: from rj.sgi.com (rj.SGI.COM [204.94.215.100]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f92IWpD21899 for ; Tue, 2 Oct 2001 11:32:51 -0700 Received: from trudge.engr.sgi.com (trudge.engr.sgi.com [130.62.176.82]) by rj.sgi.com (8.11.4/8.11.4/linux-outbound_gateway-1.0) with ESMTP id f92IWkL03644 for ; Tue, 2 Oct 2001 11:32:46 -0700 Received: (from mja@localhost) by trudge.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) id LAA06881 for state-threads@oss.sgi.com; Tue, 2 Oct 2001 11:30:29 -0700 (PDT) From: mja@trudge.engr.sgi.com (Mike Abbott) Message-Id: <200110021830.LAA06881@trudge.engr.sgi.com> Subject: Resend: Re: st_netfd_poll ... To: state-threads@oss.sgi.com Date: Tue, 2 Oct 2001 11:30:28 -0700 (PDT) X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-state-threads@oss.sgi.com Precedence: bulk [Resending. Originally sent 9/25 but it bounced!] > If we complete a partial write and then timeout, we return -1. This > doesn't give the caller any chance to update their data pointers and > re-issue the call. Except for timeouts, network I/O errors are usually unrecoverable, meaning that trying again won't help. I agree that when a network I/O operation times out after a partial success there ought to be a way to determine how much succeeded. As Gene suggested, write your own -- and then contribute it to the project for all to share. [Actually I have coded up a version of this in the interim but haven't tried it yet. Anyone want to test it for me?] However this is not John Val's original issue, which was that the timeout seems to trigger early. John, can you post a minimal program that exhibits the undesirable behavior so we can diagnose it? Can you make it happen with just a single state thread? Alternatively you could trace through the execution of the ST code yourself, watching for erroneous behavior, and ask for more specific help. -- Michael J. Abbott mja@sgi.com www.repbot.org/mike From owner-state-threads@oss.sgi.com Thu Oct 4 14:37:34 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f94LbYU23260 for state-threads-outgoing; Thu, 4 Oct 2001 14:37:34 -0700 Received: from zok.sgi.com (zok.sgi.com [204.94.215.101]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f94LbWD23257 for ; Thu, 4 Oct 2001 14:37:32 -0700 Received: from trudge.engr.sgi.com (trudge.engr.sgi.com [130.62.176.82]) by zok.sgi.com (8.11.4/8.11.4/linux-outbound_gateway-1.0) with ESMTP id f94LbQK17839 for ; Thu, 4 Oct 2001 14:37:26 -0700 Received: (from mja@localhost) by trudge.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) id OAA24403 for state-threads@oss.sgi.com; Thu, 4 Oct 2001 14:35:08 -0700 (PDT) From: mja@trudge.engr.sgi.com (Mike Abbott) Message-Id: <200110042135.OAA24403@trudge.engr.sgi.com> Subject: Re: Code Re: Resend: Re: st_netfd_poll ... To: state-threads@oss.sgi.com Date: Thu, 4 Oct 2001 14:35:07 -0700 (PDT) In-Reply-To: <01100415052800.01075@bergsee.bio.uva.nl> from "John Val" at Oct 04, 2001 03:05:28 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-state-threads@oss.sgi.com Precedence: bulk [John Val sent an example program to me and to the list, but the list rejected his too-large letter. This is my response to that letter, made public so others can contribute if desired.] John, thanks for the code. Got it to run. > Typically I get lines in errors like > [[04/Oct/2001:14:25:33] WARN: can't write line 1451 within, call took 3658 of > requested 10000 microseconds trying to write to 127.0.0.1: st_write: Timer > expired I see these errors too but the elapsed time is always greater than the 10000 usec timeout. Looks like your early timeouts are not a state-threads issue but an issue with the system on which you run the server. Can you try running the server on a different kind of system (different operating system, in particular) to see if the problem shows up there? Perhaps if you elaborate on the kind of system that exhibits the erroneous behavior someone will be able to say, "yeah, that O/S has timing problems, apply this patch...." -- Michael J. Abbott mja@sgi.com www.repbot.org/mike From owner-state-threads@oss.sgi.com Mon Oct 8 02:48:29 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f989mTx10181 for state-threads-outgoing; Mon, 8 Oct 2001 02:48:29 -0700 Received: from mail.science.uva.nl (mail.science.uva.nl [146.50.4.51]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f989mPD10178 for ; Mon, 8 Oct 2001 02:48:26 -0700 Received: from bergsee.bio.uva.nl [145.18.171.114] by mail.science.uva.nl with SMTP (sendmail 8.11.6/config 11.18). id f989kCc02916; Mon, 8 Oct 2001 11:46:12 +0200 (MEST) X-Organisation: Faculty of Science, University of Amsterdam, The Netherlands X-URL: http://www.science.uva.nl/ Content-Type: text/plain; charset="iso-8859-1" From: John Val To: mja@trudge.engr.sgi.com (Mike Abbott), state-threads@oss.sgi.com Subject: Re: Code Re: Resend: Re: st_netfd_poll ... Date: Mon, 8 Oct 2001 11:44:54 +0200 X-Mailer: KMail [version 1.2] References: <200110042135.OAA24403@trudge.engr.sgi.com> In-Reply-To: <200110042135.OAA24403@trudge.engr.sgi.com> MIME-Version: 1.0 Message-Id: <01100811445400.08009@bergsee.bio.uva.nl> Content-Transfer-Encoding: 8bit Sender: owner-state-threads@oss.sgi.com Precedence: bulk Hi Mike Thanks for testing. Luckily you did not have the error. However I still have. I am running Linux-2.4.3 on a Dell inspiron/ 7500 with a 800 MHz processor. Any one else seeing this propblems? Thanks, John -- Dr. John Val Population Biology section Instituut voor Biodiversiteit en Ecosysteem Dynamica Faculteit der Natuurwetenschappen, Wiskunde en Informatica University of Amsterdam Kruislaan 320 1098SM Amsterdam Telephone: () 31 20 5257768 E-mail: val@science.uva.nl E-mail: j.val@hccnet.nl (personal) home: http://home.hccnet.nl/j.val From owner-state-threads@oss.sgi.com Wed Oct 10 14:38:33 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9ALcXb32379 for state-threads-outgoing; Wed, 10 Oct 2001 14:38:33 -0700 Received: from zok.sgi.com (zok.sgi.com [204.94.215.101]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9ALcVD32376 for ; Wed, 10 Oct 2001 14:38:31 -0700 Received: from trudge.engr.sgi.com (trudge.engr.sgi.com [130.62.176.82]) by zok.sgi.com (8.11.4/8.11.4/linux-outbound_gateway-1.0) with ESMTP id f9ALcQK25922 for ; Wed, 10 Oct 2001 14:38:26 -0700 Received: (from mja@localhost) by trudge.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) id OAA32940 for state-threads@oss.sgi.com; Wed, 10 Oct 2001 14:36:03 -0700 (PDT) From: mja@trudge.engr.sgi.com (Mike Abbott) Message-Id: <200110102136.OAA32940@trudge.engr.sgi.com> Subject: 1.3a To: state-threads@oss.sgi.com Date: Wed, 10 Oct 2001 14:36:02 -0700 (PDT) X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-state-threads@oss.sgi.com Precedence: bulk I wrote: > I agree that when a network I/O operation times out after a partial > success there ought to be a way to determine how much succeeded. > > [Actually I have coded up a version of this in the interim but haven't > tried it yet. Anyone want to test it for me?] This UNTESTED code, called version 1.3a (for alpha), is now available for download from: http://oss.sgi.com/projects/state-threads/download This does not address John Val's early-timeout issue, which we're still investigating. It only adds two new functions, st_read_resid and st_write_resid, to address the above deficiency. Read all about them in the included reference manual. I welcome all kinds of feedback. -- Michael J. Abbott mja@sgi.com www.repbot.org/mike From owner-state-threads@oss.sgi.com Mon Oct 15 16:05:48 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9FN5mY18426 for state-threads-outgoing; Mon, 15 Oct 2001 16:05:48 -0700 Received: from mail.abeona.com ([209.81.58.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9FN5dD18420 for ; Mon, 15 Oct 2001 16:05:44 -0700 Received: from abeona.com (IDENT:gsh@concord [192.168.1.56]) by mail.abeona.com (8.9.3/8.9.3) with ESMTP id PAA07733 for ; Mon, 15 Oct 2001 15:59:41 -0700 Message-ID: <3BCB6BC9.C5C0ADD5@abeona.com> Date: Mon, 15 Oct 2001 16:05:45 -0700 From: Gene Shekhtman Organization: Abeona Networks, Inc. X-Mailer: Mozilla 4.72 [en] (X11; I; Linux 2.2.12-20 i686) X-Accept-Language: en MIME-Version: 1.0 To: state-threads@oss.sgi.com Subject: Re: st_netfd_poll returns within the requested timeout with timed out error References: <01092411323403.07213@bergsee.bio.uva.nl> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-state-threads@oss.sgi.com Precedence: bulk John Val wrote: > During a computation I send the numerical results line after line to the > client. Typically 1000 lines are send in one computation. On the client I > noticed that some lines were not send correctly. I think that John's problem is caused by the fact that ST time resolution is actually a time interval between scheduling points. That time interval may be as large as 1 second in some situations (e.g., when a single thread does a lot of CPU-intensive work continuously without a context switch). Take a look at the _st_vp_check_clock() function in sched.c. The elapsed time is calculated as (now - _st_this_vp.last_clock) and then if (elapsed >= thread->sleep), thread is popped from the sleep queue. So, for example, if elapsed is 20ms and timeout is 10ms, the thread will wake up as soon as _st_vp_check_clock() is called. In John's case the specified I/O timeout of 10ms is less than the time interval between scheduling points. John's application issues about 1000 st_write()s before filling output socket buffer and going to select() with 10ms timeout. Linux select(2) man page says: timeout is an *upper* bound on the amount of time elapsed before select returns. If, for example, ~1000 st_writes took 8ms and select() timed out after 4ms, the elapsed time is 12ms (> 10ms) and st_write() returns with timeout error despite that it waited for only 4ms. I think that there is a misunderstanding around the whole timeout issue. In 99.9% of all cases I/O timeouts are used either for connection failure detection or to prevent a peer from holding idle connection for too long. So for most applications realistic I/O timeouts should usually be order of seconds. Also, I can't see any point in retrying after timeout happened. If someone wants to retry after timeout, why wouldn't he increase the timeout value to begin with? If application for some reason wants to know the number of bytes successfully transferred, it should use st_*_resid() functions Mike added in the 1.3a version. --Gene From owner-state-threads@oss.sgi.com Thu Oct 25 07:18:08 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9PEI8I09256 for state-threads-outgoing; Thu, 25 Oct 2001 07:18:08 -0700 Received: from spelio.netli.lan ([194.58.47.40]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9PEHwD09253 for ; Thu, 25 Oct 2001 07:17:58 -0700 Received: (from vlm@localhost) by spelio.netli.lan (8.11.3/8.11.1) id f9PEHo308203 for state-threads@oss.sgi.com; Thu, 25 Oct 2001 18:17:50 +0400 (MSD) (envelope-from vlm) Message-Id: <200110251417.f9PEHo308203@spelio.netli.lan> Subject: strange code To: state-threads@oss.sgi.com Date: Thu, 25 Oct 2001 18:17:49 +0400 (MSD) From: Lev Walkin X-Mailer: ELM [version 2.4ME+ PL82 (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Sender: owner-state-threads@oss.sgi.com Precedence: bulk Hi there, The sources (io.c) has the following function: === int st_connect(st_netfd_t *fd, struct sockaddr *addr, int addrlen, st_utime_t timeout) { int n, err = 0; while (connect(fd->osfd, addr, addrlen) < 0) { if (errno != EINTR) { if (errno != EINPROGRESS && (errno != EADDRINUSE || err == 0)) return -1; /* Wait until the socket becomes writable */ if (st_netfd_poll(fd, POLLOUT, timeout) < 0) return -1; /* Try to find out whether the connection setup succeeded or failed */ n = sizeof(int); if (getsockopt(fd->osfd, SOL_SOCKET, SO_ERROR, (char *)&err, (socklen_t *)&n) < 0) return -1; if (err) { errno = err; return -1; } break; } err = 1; } return 0; } === Can anybody explain the hidden meaning of err = 1 string? -- Lev Walkin vlm@netli.com From owner-state-threads@oss.sgi.com Thu Oct 25 16:19:09 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9PNJ9l08306 for state-threads-outgoing; Thu, 25 Oct 2001 16:19:09 -0700 Received: from mail.abeona.com ([209.81.58.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9PNJ4008302 for ; Thu, 25 Oct 2001 16:19:04 -0700 Received: from abeona.com (IDENT:gsh@concord [192.168.1.56]) by mail.abeona.com (8.9.3/8.9.3) with ESMTP id QAA06468; Thu, 25 Oct 2001 16:12:22 -0700 Message-ID: <3BD89DF2.60294384@abeona.com> Date: Thu, 25 Oct 2001 16:19:14 -0700 From: Gene Shekhtman Organization: Abeona Networks, Inc. X-Mailer: Mozilla 4.72 [en] (X11; I; Linux 2.2.12-20 i686) X-Accept-Language: en MIME-Version: 1.0 To: Lev Walkin CC: state-threads@oss.sgi.com Subject: Re: strange code References: <200110251417.f9PEHo308203@spelio.netli.lan> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-state-threads@oss.sgi.com Precedence: bulk Lev Walkin wrote: > Hi there, > > The sources (io.c) has the following function: > > === > > int st_connect(st_netfd_t *fd, struct sockaddr *addr, int addrlen, > st_utime_t timeout) > { > int n, err = 0; > > while (connect(fd->osfd, addr, addrlen) < 0) { > if (errno != EINTR) { > if (errno != EINPROGRESS && (errno != EADDRINUSE || err == 0)) > return -1; > /* Wait until the socket becomes writable */ > if (st_netfd_poll(fd, POLLOUT, timeout) < 0) > return -1; > /* Try to find out whether the connection setup succeeded or failed */ > n = sizeof(int); > if (getsockopt(fd->osfd, SOL_SOCKET, SO_ERROR, (char *)&err, > (socklen_t *)&n) < 0) > return -1; > if (err) { > errno = err; > return -1; > } > break; > } > err = 1; > } > > return 0; > } > > === > > Can anybody explain the hidden meaning of > > err = 1 > > string? > Some platforms (e.g., IRIX 6.2, fixed in 6.5) have "peculiar" implementation of connect(2). On those platforms if connect(2) is interrupted (errno == EINTR) after socket was bound by the kernel, the second connect(2) attempt will fail with errno == EADDRINUSE. The code above ignores EADDRINUSE iff connect(2) was previously interrupted (the err flag was set). --Gene From owner-state-threads@oss.sgi.com Thu Oct 25 16:51:15 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9PNpFZ09319 for state-threads-outgoing; Thu, 25 Oct 2001 16:51:15 -0700 Received: from mail.abeona.com ([209.81.58.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9PNpC009316 for ; Thu, 25 Oct 2001 16:51:12 -0700 Received: from abeona.com (IDENT:gsh@concord [192.168.1.56]) by mail.abeona.com (8.9.3/8.9.3) with ESMTP id QAA07899; Thu, 25 Oct 2001 16:44:30 -0700 Message-ID: <3BD8A57A.EAFF2D24@abeona.com> Date: Thu, 25 Oct 2001 16:51:22 -0700 From: Gene Shekhtman Organization: Abeona Networks, Inc. X-Mailer: Mozilla 4.72 [en] (X11; I; Linux 2.2.12-20 i686) X-Accept-Language: en MIME-Version: 1.0 To: Lev Walkin , state-threads@oss.sgi.com Subject: Re: strange code References: <200110251417.f9PEHo308203@spelio.netli.lan> <3BD89DF2.60294384@abeona.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-state-threads@oss.sgi.com Precedence: bulk Gene Shekhtman wrote: > > > > Can anybody explain the hidden meaning of > > > > err = 1 > > > > string? > > > > Some platforms (e.g., IRIX 6.2, fixed in 6.5) have "peculiar" > implementation of connect(2). On those platforms if connect(2) is > interrupted (errno == EINTR) after socket was bound by the kernel, > the second connect(2) attempt will fail with errno == EADDRINUSE. > > The code above ignores EADDRINUSE iff connect(2) was previously > interrupted (the err flag was set). > > --Gene I just found more details in the Rich Stevens' "UNIX Network Programming", Vol.1, 2nd edition, p. 413 ("Interrupted connect"). From owner-state-threads@oss.sgi.com Mon Oct 29 21:36:15 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9U5aFL00788 for state-threads-outgoing; Mon, 29 Oct 2001 21:36:15 -0800 Received: from slog.snvl1.sfba.home.com (c1559817-a.snvl1.sfba.home.com [67.166.21.187]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9U5aD000785 for ; Mon, 29 Oct 2001 21:36:13 -0800 Received: (from mja@localhost) by slog.snvl1.sfba.home.com (8.11.2/8.11.2) id f9U5aRq01328; Mon, 29 Oct 2001 21:36:27 -0800 From: Mike Abbott Message-Id: <200110300536.f9U5aRq01328@slog.snvl1.sfba.home.com> Subject: State Threads Project moved To: state-threads@oss.sgi.com, state-threads-announce@lists.sourceforge.net Date: Mon, 29 Oct 2001 21:36:27 -0800 (PST) X-Mailer: ELM [version 2.5 PL3] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-state-threads@oss.sgi.com Precedence: bulk The State Threads Project has moved from SGI to SourceForge. The old web site and mailing list will soon either disappear or redirect/forward to the new versions. The new site and lists do not point back to the old. Please subscribe to the new mailing lists yourself; old-list subscribers will not be automatically added to the new lists. Old site: http://oss.sgi.com/projects/state-threads/ New site: http://state-threads.sourceforge.net/ There are three mailing lists now rather than just one (not that the list ever had heavy traffic but that's just how SourceForge likes to do things) so be sure to sign up for the appropriate list(s): one only for announcements, one for developers, and one for users. Thanks for tolerating this administrivia. Gene and I believe this move is in the best interests of the project. -- Mike Abbott mabbott@users.sourceforge.net State Threads Project co-administrator