Issue #14997 has been reported by maciej.mensfeld (Maciej Mensfeld).

----------------------------------------
Bug #14997: Socket connect timeout exceeds the timeout value for 
https://bugs.ruby-lang.org/issues/14997

* Author: maciej.mensfeld (Maciej Mensfeld)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 2.5.1
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
Given a case, where a domain is being resolved to multiple IPs (4 in the following example):

```
dig debug-xyz.elb.us-east-1.amazonaws.com a

; <<>> DiG 9.10.3-P4-Ubuntu <<>> debug-xyz.elb.us-east-1.amazonaws.com a
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54375
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;debug-xyz.elb.us-east-1.amazonaws.com. IN A

;; ANSWER SECTION:
debug-xyz.elb.us-east-1.amazonaws.com. 60 IN A 172.31.86.79
debug-xyz.elb.us-east-1.amazonaws.com. 60 IN A 172.31.109.24
debug-xyz.elb.us-east-1.amazonaws.com. 60 IN A 172.31.119.55
debug-xyz.elb.us-east-1.amazonaws.com. 60 IN A 172.31.71.167

;; Query time: 4 msec
;; SERVER: 172.31.0.2#53(172.31.0.2)
;; WHEN: Tue Aug 14 13:46:18 UTC 2018
;; MSG SIZE  rcvd: 132
```

and when `connect_timeout` is set to a certain value (N), the overall timeout upon non-responsive endpoints that don't immediately throw an exception can reach `N * 4`.

This can disrupt some time-sensitive systems.

We've experienced it with the following setup:

- TCP server (event machine) behind an AWS NLB
- TCP server process goes down behind NLB but NLB is still responsive
- Socket connect_timeout is set to 100ms
- AWS NLB keeps the connection in the waiting state hoping that the service behind it will get back to normal (but it doesn't)
- Ruby timeouts after 100ms
- Ruby tries to connect to the next IP from the pool (AWS NLB again)
- Due to 4 hosts resolving, the overall timeout is 400ms.

Not sure whether this should be qualified as a bug or a feature, but I believe it should be definitely documented or there should be an option to "hard" block this limit.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>