Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks that claim to emit results but don't should fail #3497

Open
bobcatfish opened this issue Nov 4, 2020 · 19 comments
Open

Tasks that claim to emit results but don't should fail #3497

bobcatfish opened this issue Nov 4, 2020 · 19 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@bobcatfish
Copy link
Collaborator

Expected Behavior

If I create a Task which claims to emit a result, but it doesn't, the TaskRun should fail, i.e. I should be able to rely on the interface a Task claims to provide.

Actual Behavior

This will only be considered a failure if something in the Pipeline tries to use the result; if nothing tries to use the result, there will be no error, but if something does try to use the result you will get an error like:

tkn pr list
NAME                     STARTED          DURATION    STATUS
sum-three-pipeline-run   10 minutes ago   5 seconds   Failed(InvalidTaskResultReference)

Steps to Reproduce the Problem

I took this example pipeline with results and made a few changes:

  • add-task no longer emits a result, tho it claims to
  • the pipeline no longer refers to results from add-task
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: add-task
spec:
  params:
    - name: first
      description: the first operand
    - name: second
      description: the second operand
  results:
    - name: sum
      description: the sum of the first and second operand
  steps:
    - name: add
      image: alpine
      env:
        - name: OP1
          value: $(params.first)
        - name: OP2
          value: $(params.second)
      command: ["/bin/sh", "-c"]
      args:
        - echo -n $((${OP1}+${OP2}))
---
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: sum-three-pipeline
spec:
  params:
    - name: first
      description: the first operand
    - name: second
      description: the second operand
    - name: third
      description: the third operand
  tasks:
    - name: first-add
      taskRef:
        name: add-task
      params:
        - name: first
          value: $(params.first)
        - name: second
          value: $(params.second)
    - name: second-add
      taskRef:
        name: add-task
      params:
        - name: first
          value: "5"
        - name: second
          value: $(params.third)
---
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  name: sum-three-pipeline-run
spec:
  pipelineRef:
    name: sum-three-pipeline
  params:
    - name: first
      value: "2"
    - name: second
      value: "10"
    - name: third
      value: "10"

If you apply this, you will see that it runs successfully. If you add any references to the results, the run will fail.

Additional Info

  • Kubernetes version: v1.16.13-gke.401

  • Tekton Pipeline version: HEAD @ 7b5b2fa

@bobcatfish bobcatfish added the kind/bug Categorizes issue or PR as related to a bug. label Nov 4, 2020
@ghost
Copy link

ghost commented Nov 4, 2020

FYI in #3472 we're removing the behaviour where a PipelineRun is put into a failed state due to a Task dropping a result. Feedback welcome!

Edit for posterity: I was wrong - a PipelineRun was never put into a failed state due to a Task dropping a result. Only if the PipelineRun couldn't fetch the assocaited PipelineSpec while trying to resolve those result references. Ignore this message!

@pritidesai
Copy link
Member

thanks @bobcatfish, #2701 has some discussion as well

@afrittoli
Copy link
Member

I feel like this depends on what we define with "producing a result". I think with the current code today is:

  • no file created -> result not produced
  • file exists, empty -> empty result is produced (which will be an empty string)

That said, it is probably good to continue to force tasks to produce a result like they do today (unless we introduce support for default values, and a task specifies a default), as this would help task authors and users to avoid silent failures from being ignored.

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 2, 2021
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 1, 2021
@ghost
Copy link

ghost commented Jun 15, 2021

@bobcatfish do you still want to pursue this one or should we leave it for the time being?

@ghost
Copy link

ghost commented Jun 15, 2021

Note: with the new feature gate / flagging work this would be classified as a backwards incompatible change and would therefore need its own feature flag. It could be made the default for a new apiVersion, such as v1.

@bobcatfish
Copy link
Collaborator Author

Feels like we are still not 100% sure what the desired behavior is here so I'd like to keep discussing

/remove-lifecycle rotten

@bobcatfish bobcatfish self-assigned this Aug 10, 2021
@tekton-robot tekton-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 10, 2021
@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 8, 2021
@bobcatfish
Copy link
Collaborator Author

I'd still like to bring this forward at some point and decide how to address this as a community - maybe as part of v1 discussions? but it hasnt been a priority yet :(

/remove-lifecycle stale

@tekton-robot tekton-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 9, 2021
@bobcatfish bobcatfish mentioned this issue Nov 9, 2021
17 tasks
@jerop jerop added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jan 10, 2022
@jerop jerop self-assigned this Jan 10, 2022
@jerop
Copy link
Member

jerop commented Feb 4, 2022

In TEP-0075, we are planning to support object results. These results embrace the optional by default behavior of JSON Schema. A task can produce an object result with a missing attribute and it won't fail. However, a subsequent task will fail if it attempts to use the missing attribute from the object result, causing the pipeline to fail. That makes the object results attributes optional if not used, and required if used. That behavior is consistent with the existing behavior reported in this issue as a bug, so wondering if we should document this behavior and close this issue?

@bobcatfish
Copy link
Collaborator Author

I'm still leaning toward failing tasks that claim to emit results but fail and I'll try to explain why.

The Task spec defines the interface of the task, including:

  • params it needs to be operate, which can be optional if defaults are provided
  • results it declares it will produce, which consumers of the Task should be able to expect and rely on

In TEP-0075 we add support for results to declare they are of type object and we start to support defining the schema of that object, using JSON schema.

The crux of the problem is, what does it mean if a Task declares it will produce a result with a JSON schema indicating it will have key "foo" but it does not have that key at runtime? JSON schema allows for keys to be declared as "required" and assumes all keys are optional by default. So if a Task declares that it will produce a result with the key "foo" and it does not, it is still technically meeting the contract it declared in that the schema it defined did say that key was optional 😭

However I think it is different when we are talking about the results themselves; I don't think the same logic we apply to each individual result that a Task declares it produces needs to the same as the application of the schema we apply to validate those results. imo it all comes down to: did the Task fulfill the contract it declared. If it said it would make a result and did not make that result, I believe it did not fulfill its contract.

So in the example of a Task declaring a result with a schema indicating it is an object with (implied optional) key foo, I would think:

  • Producing no result at all (i.e. writing nothing to /tekton/results/my-result) would be an error
  • Producing an empty object (i.e. writing {} to /tekton/results/my-result) would fulfill its contract (it emitted the result it said it would and it matched the schema)
  • Producing an object with key "foo" (i.e. writing {"foo": "bar"} to /tekton/results/my-result) would fulfill its contract
  • Producing an object with key "foo" and additional keys (e.g. writing {"foo": "bar", "baz":"qux"} to /tekton/results/my-result) would also fulfill the contract (unless the schema included additionalProperties:false)

All of that being said I can see why this would be confusing 😅 maybe we can find a compromise in TEP-0075 that manages to balance both worlds, I'm going to make a suggestion over there.

@jerop jerop added this to the Pipelines v0.36 milestone Mar 10, 2022
@jerop
Copy link
Member

jerop commented Mar 15, 2022

@bobcatfish thank you for the detailed discussion above - makes sense to distinguish between producing results and keys in object results - especially given that users will have control to say that they keys are required if they wish

agreed, we can go ahead with making the results themselves required - we'll need to explain this well in the docs though (cc @vinamra28)

@bobcatfish bobcatfish removed their assignment Mar 18, 2022
@pritidesai
Copy link
Member

/assign @vinamra28

@tekton-robot
Copy link
Collaborator

@pritidesai: GitHub didn't allow me to assign the following users: vinamra28.

Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @vinamra28

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vinamra28
Copy link
Member

/assign

@pritidesai
Copy link
Member

@vinamra28 is implementing TEP-0048 which addresses this issue

@dibyom
Copy link
Member

dibyom commented Jul 17, 2023

related: #6817

@afrittoli
Copy link
Member

As discussed during the API WG on 17.07, moving this to nice-to-have for v1, in favour of #6932

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
Status: In Progress
Status: Todo
Development

No branches or pull requests

8 participants