DNS on toolforge kubernetes seems to fail regularly (20-25% of the time at least)
Closed, ResolvedPublicBUG REPORT
Actions

Assigned To

Authored By

	ArthurPSmith
	Sat, Aug 24, 1:21 PM

Description

Steps to replicate the issue (include links if applicable):

Go to: https://author-disambiguator.toolforge.org/work_item_oauth.php
If it works, repeat half a dozen times until it fails

Note - this is an php app running on kubernetes - see /data/project/author-disambiguator etc.

What happens?:
Fatal error: Uncaught mysqli_sql_exception: php_network_getaddresses: getaddrinfo for tools.db.svc.eqiad.wmflabs failed: Temporary failure in name resolution in /data/project/author-disambiguator/public_html/lib/database_tools.php:15

What should have happened instead?:
You should have seen the default page for the application (after OAuth login)

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Related Objects

Mentioned In: T356163: ChieBot: Intermittent connection reset by peer errors
T373233: Refill tool stuck "waiting for an available worker"
T373293: [builds-api] quota command failing on functional tests on tools
T373269: Tech Contribs does not support parentheses in user names
T373266: failure in name resolution and Uncaught Error in stalktoy on toolforge
Mentioned Here: T373816: Cloud VPS: investigate conntrack table usage on cloudvirt1050

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

CropTool has been having similar issues and is unable to connect to mediawiki.org and/or commons.wikimedia.org. See Commons_talk:CropTool#Unable_to_open_any_image_in_CropTool.

Samwilson subscribed.Sat, Aug 24, 10:54 PM

Don-vip subscribed.Sun, Aug 25, 5:46 PM

Same for my tool (pod spacemedia-6fdcc8d798-8sncn). Started to fail at 2024-08-25T17:38:18.469Z with error message "java.net.UnknownHostException: tools.db.svc.wikimedia.cloud"
I don't see name resolution problem on bastion nor my cloud vps instances.

Don-vip awarded a token.Sun, Aug 25, 5:51 PM

Failed on first try:

Fatal error: Uncaught mysqli_sql_exception: php_network_getaddresses: getaddrinfo for tools.db.svc.wikimedia.cloud failed: Temporary failure in name resolution in /data/project/author-disambiguator/public_html/lib/database_tools.php:15 Stack trace: #0 /data/project/author-disambiguator/public_html/lib/database_tools.php(15): mysqli->__construct() #1 /data/project/author-disambiguator/public_html/work_item_oauth.php(7): DatabaseTools->openToolDB() #2 {main} thrown in /data/project/author-disambiguator/public_html/lib/database_tools.php on line 15

getting this for AntiCompositeBot's nolicense task as well (Pod/anticompositebot.nolicense-cron-28743485-x7fqt on tools-k8s-worker-nfs-38):

2024-08-25 18:06:37 nolicense ERROR: (2003, "Can't connect to MySQL server on 'commonswiki.analytics.db.svc.wikimedia.cloud' ([Errno -3] Temporary failure in name resolution)")

Don-vip mentioned this in T373266: failure in name resolution and Uncaught Error in stalktoy on toolforge.Sun, Aug 25, 6:14 PM

JJMC89 merged a task: T373266: failure in name resolution and Uncaught Error in stalktoy on toolforge.Sun, Aug 25, 6:25 PM

JJMC89 added a subscriber: Jeff_G.

Count_Count subscribed.Sun, Aug 25, 6:28 PM

Stuartyeates subscribed.Sun, Aug 25, 7:19 PM

I think this is related:

ERROR: TjfCliError: The jobs service seems to be down – please retry in a few minutes.
ERROR: Please report this issue to the Toolforge admins if it persists: https://w.wiki/6Zuu

tools.krinklebot is facing Could not resolve host: commons.wikimedia.org for production hostnames as well. This runs as scheduled toolforge job:

[2024-08-24T15:40:46+00:00] ERROR: Skipping [[Project:Auto-protected files/wikipedia/de]] due to RuntimeException: Could not resolve host: commons.wikimedia.org in /data/project/krinklebot/src/fileprotectionsync/src/FileProtectionSyncBot.php:282
[2024-08-24T15:41:17+00:00] ERROR: Skipping [[Project:Auto-protected files/wikipedia/en]] due to RuntimeException: Could not resolve host: commons.wikimedia.org in /data/project/krinklebot/src/fileprotectionsync/src/FileProtectionSyncBot.php:282
[2024-08-24T20:31:19+00:00] ERROR: Skipping [[Project:Auto-protected files/wikipedia/de]] due to RuntimeException: Could not resolve host: commons.wikimedia.org in /data/project/krinklebot/src/fileprotectionsync/src/FileProtectionSyncBot.php:282
[2024-08-25T19:10:55+00:00] ERROR: Skipping [[Project:Auto-protected files/wikipedia/de]] due to RuntimeException: Could not resolve host: commons.wikimedia.org in /data/project/krinklebot/src/fileprotectionsync/src/FileProtectionSyncBot.php:282
[2024-08-25T19:11:27+00:00] ERROR: Skipping [[Project:Auto-protected files/wikipedia/en]] due to RuntimeException: Could not resolve host: commons.wikimedia.org in /data/project/krinklebot/src/fileprotectionsync/src/FileProtectionSyncBot.php:282
[2024-08-25T19:11:58+00:00] ERROR: Skipping [[Project:Auto-protected files/wikinews/en]] due to RuntimeException: Could not resolve host: commons.wikimedia.org in /data/project/krinklebot/src/fileprotectionsync/src/FileProtectionSyncBot.php:282
[2024-08-25T19:12:29+00:00] ERROR: Skipping [[Project:Auto-protected files/wiktionary/en]] due to RuntimeException: Could not resolve host: commons.wikimedia.org in /data/project/krinklebot/src/fileprotectionsync/src/FileProtectionSyncBot.php:282
[2024-08-25T19:13:00+00:00] ERROR: Skipping [[Project:Auto-protected files/wikipedia/fa]] due to RuntimeException: Could not resolve host: commons.wikimedia.org in /data/project/krinklebot/src/fileprotectionsync/src/FileProtectionSyncBot.php:282
[2024-08-25T19:13:42+00:00] ERROR: Skipping [[Project:Auto-protected files/wikipedia/fr]] due to RuntimeException: Could not resolve host: commons.wikimedia.org in /data/project/krinklebot/src/fileprotectionsync/src/FileProtectionSyncBot.php:282

Just got a different message from https://author-disambiguator.toolforge.org/names_oauth.php?... . This may be a result of a DNS failure not being caught?
`
Warning: Undefined variable $http_response_header in /data/project/author-disambiguator/public_html/lib/borrowed_utilities.php on line 41`

mdaniels5757 triaged this task as Unbreak Now! priority.Sun, Aug 25, 9:02 PM

Daimona awarded a token.Sun, Aug 25, 9:41 PM

Daimona subscribed.

Noting here that I'm unable to use Build Service, probably due to the same issue. Related log line:

[step-clone] 2024-08-25T22:59:56.754700588Z {"level":"error","ts":1724626796.754072,"caller":"git/git.go:55","msg":"Error running git [fetch --recurse-submodules=yes --depth=1 origin --update-head-ok --force ]: exit status 128\nfatal: unable to access 'https://gitlab.wikimedia.org/toolforge-repos/techcontribs/': Could not resolve host: gitlab.wikimedia.org\n","stacktrace":"github.com/tektoncd/pipeline/pkg/git.run\n\tgithub.com/tektoncd/pipeline/pkg/git/git.go:55\ngithub.com/tektoncd/pipeline/pkg/git.Fetch\n\tgithub.com/tektoncd/pipeline/pkg/git/git.go:150\nmain.main\n\tgithub.com/tektoncd/pipeline/cmd/git-init/main.go:53\nruntime.main\n\truntime/proc.go:255"}

Chlod mentioned this in T373269: Tech Contribs does not support parentheses in user names.Sun, Aug 25, 11:05 PM

Novem_Linguae subscribed.Sun, Aug 25, 11:50 PM

Are people still seeing this issue? I'm unable to produce the specific failure mentioned in the task description.

The last one I got was 2024-08-25 22:07:47Z. But it's been intermittent the whole time.

by 'intermittent' do you mean that it's always failing a little bit, or that every few hours it fails a lot, for a few minutes?

I'm seeing failures of URLs like https://orcid-scraper.toolforge.org/results?qid=Q112671057

"Internal Server Error / The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application."

For me the errors are gone (toolforge job service works, I was able to build and deploy my tool. No more DNS errors, everything looks fine).

dcaro mentioned this in T373293: [builds-api] quota command failing on functional tests on tools.Mon, Aug 26, 7:31 AM

dcaro merged a task: T373293: [builds-api] quota command failing on functional tests on tools.Mon, Aug 26, 7:41 AM

dcaro subscribed.

Coredns does not seem to have spikes in usage, cpu:

Mem

Looking

hmm... from a webservice shell, we get sometimes a non authoritative answer:

I have no name!@shell-1724659470:~$ nslookup tools-harbor.wmcloud.org
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   tools-harbor.wmcloud.org
Address: 172.16.5.140



I have no name!@shell-1724659470:~$ nslookup tools-harbor.wmcloud.org
Server:         10.96.0.10
Address:        10.96.0.10#53

Non-authoritative answer:
Name:   tools-harbor.wmcloud.org
Address: 172.16.5.140

Just manually scaled up the number of replicas for the coredns deployment from 2 to 4, and things seem to be improving, is anyone still seeing issues?

Yep, still having issues, looking

RhinosF1 merged a task: T373319: GUC displays a database error.Mon, Aug 26, 11:24 AM

RhinosF1 added subscribers: Melos, RhinosF1.

Querying from a webservice shell fails pretty frequently, even for internal names (and without domain searching, ie. with trailing .):

I have no name!@shell-1724670591:~$ time nslookup api.svc.tools.eqiad1.wikimedia.cloud.
Server:         10.96.0.10
Address:        10.96.0.10#53

api.svc.tools.eqiad1.wikimedia.cloud    canonical name = k8s.svc.tools.eqiad1.wikimedia.cloud.
Name:   k8s.svc.tools.eqiad1.wikimedia.cloud
Address: 172.16.6.113


real    0m0.041s
user    0m0.013s
sys     0m0.017s
########################################################################
I have no name!@shell-1724670591:~$ time nslookup api.svc.tools.eqiad1.wikimedia.cloud.
Server:         10.96.0.10
Address:        10.96.0.10#53

api.svc.tools.eqiad1.wikimedia.cloud    canonical name = k8s.svc.tools.eqiad1.wikimedia.cloud.
Name:   k8s.svc.tools.eqiad1.wikimedia.cloud
Address: 172.16.6.113
;; communications error to 10.96.0.10#53: timed out


real    0m5.050s
user    0m0.018s
sys     0m0.014s

It's running on worker-104

tools.wm-lol@tools-bastion-13:~$ kubectl get pods shell-1724670591 -o yaml | grep worker
  nodeName: tools-k8s-worker-104

From the coredns pod it's way more reliable:

oot@tools-k8s-control-7:~# time nsenter -n -t 1775910 nslookup api.svc.tools.eqiad1.wikimedia.cloud. 10.96.0.10
Server:         10.96.0.10
Address:        10.96.0.10#53

api.svc.tools.eqiad1.wikimedia.cloud    canonical name = k8s.svc.tools.eqiad1.wikimedia.cloud.
Name:   k8s.svc.tools.eqiad1.wikimedia.cloud
Address: 172.16.6.113


real    0m0.049s
user    0m0.010s
sys     0m0.030s

Trying with nsenter from a few other containers/workers

I can reproduce with nsenter on the worker:

root@tools-k8s-worker-104:~# time nsenter -t 578510 -n nslookup api.svc.tools.eqiad1.wikimedia.cloud. 10.96.0.10
;; communications error to 10.96.0.10#53: timed out
Server:         10.96.0.10
Address:        10.96.0.10#53

api.svc.tools.eqiad1.wikimedia.cloud    canonical name = k8s.svc.tools.eqiad1.wikimedia.cloud.
Name:   k8s.svc.tools.eqiad1.wikimedia.cloud
Address: 172.16.6.113
;; communications error to 10.96.0.10#53: timed out


real    0m2.043s
user    0m0.021s
sys     0m0.020s

When I'm trying to build an image from my github repo, I got this strange issue:

unable to access 'https://github.com/Saisengen/wikibots/': Could not resolve host: github.com\n"

Could it be related to this issue?

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-26T12:42:55Z] <wmbot~dcaro@urcuchillay> START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-104 (T373243)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-26T12:44:11Z] <wmbot~dcaro@urcuchillay> END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-104 (T373243)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-26T12:53:14Z] <dcaro@cloudcumin1001> START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-104 (T373243)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-26T12:53:19Z] <dcaro@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-104 (T373243)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-26T13:05:06Z] <wmbot~dcaro@urcuchillay> START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-4, tools-k8s-worker-nfs-15, tools-k8s-worker-nfs-18, tools-k8s-worker-nfs-25, tools-k8s-worker-nfs-51, tools-k8s-worker-nfs-52, tools-k8s-worker-104 (T373243)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-26T13:12:41Z] <wmbot~dcaro@urcuchillay> END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-4, tools-k8s-worker-nfs-15, tools-k8s-worker-nfs-18, tools-k8s-worker-nfs-25, tools-k8s-worker-nfs-51, tools-k8s-worker-nfs-52, tools-k8s-worker-104 (T373243)

So going around with cumin, we found some workers that fail often:

tools-k8s-worker-{nfs-{4,15,18,25,51,52},104}

# running this many times to get all the failures
root@cloudcumin1001:~# cumin --force 'O{project:tools name:.*worker.*}' 'nsenter -n -t $(pgrep calico| head -n1) dig +tries=1 tools-harbor.wmcloud.org @10.96.0.10'

The rest of workers do not seem to fail, those are restarting right now, though that did not help with worker-104 :/, so might have to find something else

The reboot did not help xd, the VMs are all running on different cloudvirts:

root@cloudcontrol1007:~# for node in tools-k8s-worker-{nfs-{4,15,18,25,51,52},104}; do echo "$node -> $(OS_PROJECT_ID=tools openstack server show $node | grep hypervisor_hostname)"; done
tools-k8s-worker-nfs-4 -> | OS-EXT-SRV-ATTR:hypervisor_hostname | cloudvirt1048.eqiad.wmnet                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
tools-k8s-worker-nfs-15 -> | OS-EXT-SRV-ATTR:hypervisor_hostname | cloudvirt1034.eqiad.wmnet                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
tools-k8s-worker-nfs-18 -> | OS-EXT-SRV-ATTR:hypervisor_hostname | cloudvirt1060.eqiad.wmnet                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
tools-k8s-worker-nfs-25 -> | OS-EXT-SRV-ATTR:hypervisor_hostname | cloudvirt1032.eqiad.wmnet                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
tools-k8s-worker-nfs-51 -> | OS-EXT-SRV-ATTR:hypervisor_hostname | cloudvirt1057.eqiad.wmnet                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
tools-k8s-worker-nfs-52 -> | OS-EXT-SRV-ATTR:hypervisor_hostname | cloudvirt1032.eqiad.wmnet                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
tools-k8s-worker-104 -> | OS-EXT-SRV-ATTR:hypervisor_hostname | cloudvirt1054.eqiad.wmnet                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-26T14:03:24Z] <wmbot~dcaro@urcuchillay> START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-4 (T373243)

I have cordoned all the misbehaving workers, users should stop seeing issues right now, will try to debug in more detail and add new nodes if I can't find anything

Just to confirm I've done a few dozen actions that would have triggered this problem a few days ago, and everything is working. Thanks!

New nodes seem to not have the issue, so will continue adding new ones (added worker-nfs-57)

Currently cleaning up the old nodes, but everything seems stable

In T373243#10091656, @MBH wrote:

When I'm trying to build an image from my github repo, I got this strange issue:

unable to access 'https://github.com/Saisengen/wikibots/': Could not resolve host: github.com\n"

Could it be related to this issue?

Yes, that was caused by this issue, it should be gone now (if not please report otherwise)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-27T08:24:38Z] <wmbot~dcaro@urcuchillay> START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-4 (T373243)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-27T08:26:28Z] <wmbot~dcaro@urcuchillay> END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-4 (T373243)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-27T08:26:55Z] <wmbot~dcaro@urcuchillay> START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-15 (T373243)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-27T08:29:14Z] <wmbot~dcaro@urcuchillay> END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-15 (T373243)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-27T08:29:23Z] <wmbot~dcaro@urcuchillay> START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-18 (T373243)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-27T08:31:12Z] <wmbot~dcaro@urcuchillay> END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-18 (T373243)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-27T08:31:21Z] <wmbot~dcaro@urcuchillay> START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-25 (T373243)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-27T08:33:06Z] <wmbot~dcaro@urcuchillay> END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-25 (T373243)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-27T08:34:07Z] <wmbot~dcaro@urcuchillay> START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-51 (T373243)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-27T08:35:51Z] <wmbot~dcaro@urcuchillay> END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-51 (T373243)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-27T08:37:08Z] <wmbot~dcaro@urcuchillay> START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-52 (T373243)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-27T08:38:58Z] <wmbot~dcaro@urcuchillay> END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-52 (T373243)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-27T08:53:37Z] <wmbot~dcaro@urcuchillay> START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-104 (T373243)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-08-27T08:55:28Z] <wmbot~dcaro@urcuchillay> END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-104 (T373243)

Yes, problem is fixed, thanks.

I'll close this as it's been stable for a while and all the misbehaving nodes have been deleted :)

dcaro moved this task from Backlog to Ready to be worked on on the Toolforge board.Tue, Aug 27, 9:53 AM

dcaro edited projects, added Toolforge (Toolforge iteration 14); removed Toolforge.

The issues I was seeing previously appear to have all resolved themselves, thank you.

dcaro moved this task from Next Up to Done on the Toolforge (Toolforge iteration 14) board.Tue, Aug 27, 9:53 AM

Novem_Linguae mentioned this in T373233: Refill tool stuck "waiting for an available worker".Tue, Aug 27, 1:09 PM

dcaro mentioned this in T356163: ChieBot: Intermittent connection reset by peer errors.Wed, Aug 28, 8:47 AM

@dcaro My tool reads data from DB replica. Less than hour earlier tool was working correctly, but now it returns this error (in 100% of all tries): Unable to connect to any of the specified MySQL hosts. ---> System.ArgumentException: The host name or IP address is invalid.

The host name is ruwiki.

In T373243#10099254, @MBH wrote:

@dcaro My tool reads data from DB replica. Less than hour earlier tool was working correctly, but now it returns this error (in 100% of all tries): Unable to connect to any of the specified MySQL hosts. ---> System.ArgumentException: The host name or IP address is invalid.

The host name is ruwiki.

Which tool is it?
Do you have the snippet of code that does the call?

All the workers seem to be responding ok (might be flaky, but no errors so far):

root@cloudcumin1001:~# cumin --force 'O{project:tools name:.*worker.*}' 'nsenter -n -t $(pgrep calico| head -n1) dig +tries=1 +short ruwiki.analytics.db.svc.wikimedia.cloud @10.96.0.10'
63 hosts will be targeted:
tools-k8s-worker-[102-103,105-108].tools.eqiad1.wikimedia.cloud,tools-k8s-worker-nfs-[1-3,5-14,16-17,19-24,26-50,53-58,60-64].tools.eqiad1.wikimedia.cloud
FORCE mode enabled, continuing without confirmation
===== NODE GROUP =====                                                                                                                                                                                                                                                                                                                                 
(63) tools-k8s-worker-[102-103,105-108].tools.eqiad1.wikimedia.cloud,tools-k8s-worker-nfs-[1-3,5-14,16-17,19-24,26-50,53-58,60-64].tools.eqiad1.wikimedia.cloud                                                                                                                                                                                        
----- OUTPUT of 'nsenter -n -t $(...loud @10.96.0.10' -----                                                                                                                                                                                                                                                                                            
s6.analytics.db.svc.wikimedia.cloud.                                                                                                                                                                                                                                                                                                                   
172.20.255.7                                                                                                                                                                                                                                                                                                                                           
================                                                                                                                                                                                                                                                                                                                                       
PASS |███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (63/63) [00:05<00:00, 12.16hosts/s]
FAIL |                                                                                                                                                                                                                                                                                                                |   0% (0/63) [00:05<?, ?hosts/s]
100.0% (63/63) success ratio (>= 100.0% threshold) for command: 'nsenter -n -t $(...loud @10.96.0.10'.
100.0% (63/63) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.

It's a web tool.

Request: https://mbh.toolforge.org/cgi-bin/page-authors?wiki=ru.wikipedia&type=cat&source=%D0%AF%D0%B7%D1%8B%D0%BA%D0%B8+%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC%D0%BC%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D1%8F+%D0%BF%D0%BE+%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D1%83&notless=2

Log path: /mnt/nfs/labstore-secondary-tools-project/mbh/error.log

Code: https://github.com/Saisengen/wikibots/blob/main/web-services/page-authors/page-authors.cs , line 59. Error generates on line 60.

This tool was work (excluding errors not related to this issue) last 3 days, with this 59-60 lines.

@MBH I'm suspecting this change: https://github.com/Saisengen/wikibots/commit/060db5fa675a14623426b88e851fa1a4f0f75e04#diff-e5265436c7ee5ee11cf4c1d17bca43ba895d05e53a357716f2691d39fd0f99d2R45

the wiki parameter in the url you passed is in position 0, not 1 (you can use the wiki string as index instead, less error-prone).

Ex. this works for me (putting type first): https://mbh.toolforge.org/cgi-bin/page-authors?type=cat&wiki=ru.wikipedia&source=%D0%AF%D0%B7%D1%8B%D0%BA%D0%B8+%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC%D0%BC%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D1%8F+%D0%BF%D0%BE+%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D1%83&notless=2

Thanks. I already used string indexation in other tools, but not this tool, because it's very old code.

Krinkle unsubscribed.Wed, Aug 28, 9:33 PM

This could be related to T373816: Cloud VPS: investigate conntrack table usage on cloudvirt1050 (to be verified).

	F57294342: image.png
	Mon, Aug 26, 7:53 AM

	F57294340: image.png
	Mon, Aug 26, 7:53 AM

DNS on toolforge kubernetes seems to fail regularly (20-25% of the time at least)Closed, ResolvedPublicBUG REPORTActions

Description

Related Objects

Event Timeline

DNS on toolforge kubernetes seems to fail regularly (20-25% of the time at least)
Closed, ResolvedPublicBUG REPORT
Actions