Skip to content
This repository has been archived by the owner on Jul 16, 2019. It is now read-only.

Provider connection error: Service Fabric responded with error code 403 Client certificate required to request #77

Open
andrewdmoreno opened this issue Oct 31, 2018 · 17 comments

Comments

@andrewdmoreno
Copy link

In a previous issue ( #75 ) I started down the path of windows security, and as per suggestion in that thread I switched to secure cluster using certificates. I followed the instructions outlined in the documentation for using openssl to convert pfx file to the formats required.

Final folder structure looks like the below:

  • TraefikPkg
  •  Code
    
  •       certs
    
  •             servicefabric.crt
    
  •             servicefabric.key
    
  •   traefik.toml
    
  •   traefik.exe
    

traefik.toml and traefik.exe are located in TraefikPkg/Code

toml file was updated to the following:

[serviceFabric.tls]
cert = "certs\servicefabric.crt"
key = "certs\servicefabric.key"
insecureskipverify = true
caoptional = true

Using that original certificate I am able to connect to the cluster via powershell so I know it's valid. When I run traefik, however, I get the following:

level=info msg="Starting provider *servicefabric.Provider {"Watch":false,"Filename":"","Constraints":null,"Trace":false,"TemplateVersion":0,"DebugLogGeneratedTemplate":false,"ClusterManagementURL":"https://xxxxxxx:19080\",\"APIVersion\":\"3.0\",\"RefreshSeconds\":10,\"TLS\":{\"CA\":\"\",\"CAOptional\":true,\"Cert\":\"certs/servicefabric.crt\",\"Key\":\"certs/servicefabric.key\",\"InsecureSkipVerify\":true}}" time="2018-10-31T11:37:59-04:00"

level=info msg="Checking service fabric config" time="2018-10-31T11:38:00-04:00"

level=error msg="Provider connection error: Service Fabric responded with error code 403 Client certificate required to request https://xxxxxxx:19080/Applications/?api-version=3.0 with body {}; retrying in 594.733437ms"

Any ideas on what else can be going wrong?

@andrewdmoreno
Copy link
Author

To troubleshoot further - I made a request to https://xxxxx:19080/Applications/?api-version=3.0 in postman and received the same 403 client certificate required error reported above. I then added the crt and key file to postman, and api call was successful. This leads me to believe that the certificate is not being included in the request.

@lawrencegripper
Copy link
Collaborator

I've seen a similar error in the past when the conversion of the pfx -> cert hasn't worked as expected but your postman test should rule this out.

Other than that there isn't anything obviously wrong with your TOML. Compared to the docs the only differences are the direction of your \ vs / in the docs and the addition of caoptional. Without caoptional what happens? Is this behavior the same if you use traefik v1.6 vs 1.7?

@andrewdmoreno
Copy link
Author

andrewdmoreno commented Nov 1, 2018

I don’t think it’s the slashes because if I change the name to files that don’t exist I get a different error stating that no PEM data could be extracted from the input.

I have tried both 1.6 and 1.7.

Question, are there any restrictions (imposed by Traefik) on the cert that you are aware of? I.E can it be a self signed cert, etc.?

I know from Service Fabric’s perspective it is valid as I have interacted successfully via Powershell and https.

Although my cert is not self signed, it’s signed by our company’s internal CA. Not sure if that impacts anything. I believe I have tried with the caoptional true and false but I can confirm.

@andrewdmoreno
Copy link
Author

For clarity and for the benefit of anyone that comes across this thread the direction of the slashes does matter, although I had already changed it in my config to reflect the proper direction. Sorry for the confusion there.

The latest TOML config I tested is the following:

[serviceFabric.tls]
  cert = "certs/servicefabric.crt"
  key = "certs/servicefabric.key"
  insecureskipverify = true

@andrewdmoreno
Copy link
Author

@lawrencegripper Any other suggestions of things I could possibly try or look into?

@lawrencegripper
Copy link
Collaborator

I'm at a bit of a loss here, not sure why it would work for Postman but not for Traefik.

One approach which I've used to debug issues in the past is run Traefik outside the cluster on your dev box and then also run Fiddler (or Charles) to capture the requests and double check their content.

A random one that does stand out to me is that you've used 'xxxx's to obscure the IP address returned in your logs, I would have expected this to just be localhost for Traefik when it is running inside the cluster.

If your happy to post your whole TOML I can take a look too and see if that helps

@andrewdmoreno
Copy link
Author

Oh the reason for that is because for troubleshooting purposes I am running Traefik on my box but pointing to our dev secured cluster on dev server. I don't have RDP access to the server the dev cluster is on, so getting the logs was more challenging for iterative debugging. I initially had it running there with the toml file pointing to localhost like in the samples (https of course) and that is where I discovered the error to begin with. After trial and error that way I shifted to my debugging approach now. From what I have read what I am doing should be fine though, correct?

I am more than willing to use Fiddler to debug. I have actually tried to do so already, but have not been unable to get any of the capture requests to appear in Fiddler. I will admit I have very limited experience with Fiddler.

What I have tried so far:

First Attempt

  1. Open Fiddler
  2. Start Traefik outside of Service Fabric from command line: traefik.exe --configfile=traefik.toml

Using this this approach I see Traefik logging to it's log file, but nothing in Fiddler.

Second Attempt

  1. Saw online I might need to adjust the machine.config so I modified C:\Windows\Microsoft.NET\Framework\v4.0.30319\Config\machine.config and just in case
    C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Config\machine.config.
  2. Added the following:
<system.net>
  <defaultProxy enabled = "true" useDefaultCredentials = "true">
    <proxy autoDetect="false" bypassonlocal="false" proxyaddress="http://127.0.0.1:8888" usesystemdefault="false" />
  </defaultProxy>
</system.net>  
  1. Restarted Traefik

Still nothing captured.

I'm assuming I'm doing something wrong here, but if you have any tips from experience would be appreciated.

@lawrencegripper
Copy link
Collaborator

Makes sense to me, we need to force the go binary to use the fiddler proxy for its requests. If this isn't doable in the TOML then you could try setting some of the environment variables mentioned here but its a long shot: https://golang.org/pkg/net/http/#ProxyFromEnvironment

@lawrencegripper
Copy link
Collaborator

In the past when I've tested with HTTPS they've been self signed certs generated by KeyVault. Can you share some details about how you are generating the certs for the cluster?

Sorry I don't have any clear guidance as it is a bit of an odd one.

@andrewdmoreno
Copy link
Author

I have access to a key vault. If you think it will be a more apples to apples comparison I can try to replicate the issue using a configuration as close as possible to what you have tried as possible. Getting Traefik set up is a priority for us at this point, so if we have to reconfigure the dev cluster to do any troubleshooting we are willing to do so. If you can share any details about how you set up your self-signed cert, I will do the same and upgrade out cluster to add the new cert.

@lawrencegripper
Copy link
Collaborator

For our testing we used Azure clusters created via the Azure CLI. This will create a cluster then spit out the necessary certs.

From the CLI --help command:

Create a cluster with a given size and self-signed certificate that is downloaded locally.
        az sf cluster create -g group-name -n cluster1 -l westus --cluster-size 4 --vm-password
        Password#1234 --certificate-output-folder MyCertificates --certificate-subject-name cluster1

@brechtvhb
Copy link

brechtvhb commented Jun 7, 2019

Did you manage to solve this issue? I am having the same problem (running traefik in a Linux container).

Connecting using curl works fine

@andrewdmoreno
Copy link
Author

@brechtvhb No, I never did solve the problem. We were not able to get Traefik working and decided to put that effort on hold at this time.

@solidcloudio
Copy link

I'm having this problem also...

{"level":"error","msg":"Provider connection error: failed to connect to Service Fabric server Get http://localhost:19080/Applications/?api-version=3.0: EOF on http://localhost:19080/Applications/?api-version=3.0; retrying in 378.884064ms","time":"2019-07-10T17:44:52Z"}
{"level":"info","msg":"Checking service fabric config","time":"2019-07-10T17:45:03Z"}

I've tried the Admin cert and Server cert..

@solidcloudio
Copy link

It seems to me the certificates for the cluster were issued to the Cluster Url, :IE -

subject=/CN=mycluster.westus2.cloudapp.azure.com issuer=/CN=mycluster.westus2.cloudapp.azure.com

NOT - https://localhost:19080

So how would it work ? Should I add a self signed Localhost cert ?

@solidcloudio
Copy link

That seems to have done the trick... I created a 3rd self signed cert with the "localhost" url. I added it to my Cluster as a Read-Only certificate. I then exported the keys as outlined in the docs and put them in my CodePackage.

Now I've gotta figure out why the call returns nothing...

{"level":"info","msg":"Starting provider *servicefabric.Provider {\"Watch\":false,\"Filename\":\"\",\"Constraints\":null,\"Trace\":false,\"TemplateVersion\":0,\"DebugLogGeneratedTemplate\":false,\"ClusterManagementURL\":\"https://localhost:19080\",\"APIVersion\":\"3.0\",\"RefreshSeconds\":10000000000,\"TLS\":{\"cert\":\"certs/lcl-servicefabric.crt\",\"insecureSkipVerify\":true},\"AppInsightsClientName\":\"\",\"AppInsightsKey\":\"\",\"AppInsightsBatchSize\":0,\"AppInsightsInterval\":0}","time":"2019-07-10T18:17:22Z"} {"level":"info","msg":"Checking service fabric config","time":"2019-07-10T18:17:32Z"} {"level":"info","msg":"Checking service fabric config","time":"2019-07-10T18:17:43Z"} {"level":"debug","msg":"Configuration received from provider servicefabric: {}","time":"2019-07-10T18:17:43Z"} {"level":"info","msg":"Server configuration reloaded on :80","time":"2019-07-10T18:17:43Z"} {"level":"info","msg":"Server configuration reloaded on :8030","time":"2019-07-10T18:17:43Z"}

@lawrencegripper
Copy link
Collaborator

Nice, glad you got it working. One cause of an empty response may be that there are no healthy services in the cluster with the correct annotations to enable routing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants