Terraform-provider-aws: Add SageMaker support

Created on 30 Nov 2017  ·  53Comments  ·  Source: hashicorp/terraform-provider-aws

Amazon has announced the new service Sage​Maker to build, train, and deploy machine learning models at scale.

Prerequisite

aws-sdk-go v1.12.36 (https://github.com/terraform-providers/terraform-provider-aws/pull/2474)

Affected Resource(s)

The following resources are needed.

For building model and training:

  • aws_sagemaker_notebook_instance (https://github.com/terraform-providers/terraform-provider-aws/pull/2999)
  • aws_presigned_notebook_instance_url
  • aws_sagemaker_training_job (https://github.com/terraform-providers/terraform-provider-aws/pull/2955)
  • aws_sagemaker_lifecycle_config resource (https://github.com/terraform-providers/terraform-provider-aws/pull/7585)

For the deployment/hosting part:

  • aws_sagemaker_model (open PR is https://github.com/terraform-providers/terraform-provider-aws/pull/2478)
  • aws_sagemaker_endpoint (open PR is https://github.com/terraform-providers/terraform-provider-aws/pull/2479)
  • aws_sagemaker_endpoint_configuration (open PR is https://github.com/terraform-providers/terraform-provider-aws/pull/2477)

Expected Behavior

Create, update, delete, and import for above listed SageMaker resources of deployment part:

resource "aws_sagemaker_endpoint" "foo" {
    name = "endpoint-foo"
    endpoint_config_name = "${aws_sagemaker_endpoint_configuration.ec.name}"
}

resource "aws_sagemaker_endpoint_configuration" "ec" {
    name = "endpoint-config-foo"

    production_variants {
                variant_name            = "variant-1"
        model_name = "${aws_sagemaker_model.m.name}"
                initial_instance_count  = 1
                instance_type           = "m3.xlarge"
                initial_variant_weight  = 1
    }
}

resource "aws_sagemaker_model" "m" {
    name = "my-model"

    primary_container {
       image = "111111111111.ecr.us-west-2.amazonaws.com/my-docker-image:latest"
       model_data_url  = "s3://111111111111-foo/model.tar.gz"
    }
}

Example

A terraform example that shows how to use the above resources to deploy your own model is here: https://github.com/terraform-providers/terraform-provider-aws/pull/2585

References

enhancement servicsagemaker

Most helpful comment

Any estimate at all when this will be released?

All 53 comments

Has anyone picked this one up yet?

@darrenhaken yes, I am working on it. See the linked PRs in the issue :-)

@jckuester I can't see a PR for the notebook instances, are you working on that too?

No, please start working on them if you like :)

Hi @jckuester @darrenhaken , anyone working on the training job resource? I'd like to contribute to this if possible

@ddcprg I am only working on the linked PRs above (i.e. for deployment). So please contribute the training job resource if you like.

cool @jckuester ! I'll be taking a look soon and comment back on this ticket with an initial prototype

@jckuester I now you've done this in your PR but I've raised #2924 just to add the vendor as per the README

I have the training job nearly done, I'm working on testing and I'll add the PR to this ticket as soon as I consider it ready

Training Job PR -> #2955

If no one has picked up the notebook resource yet I'll take it

Notebook Instance PR -> #2999

How can we help to put some traction on these PR's? I'm going to go over the code and add any new features included by AWS recently if needed

@jckuester would you mind adding a new item to the list above to include support for "lifecycle configurations"? I'll take a look and post the PR number later

@randomcamel @radeksimko how can we help to push these PR's forward?

+1, very interested in this functionality

Not been any movement on this for a while (looks like failed integration tests), any chance of getting it pushed on?

This has been open for long time, we are still waiting feedback from Hashicorp. The failures in the build are due to other issues in master at the time of merging, they may well have gone away by now

@ashleyjkell from time to time, I rebased my PRs (https://github.com/terraform-providers/terraform-provider-aws/pull/2477 - https://github.com/terraform-providers/terraform-provider-aws/pull/2479) to make integration test green again (I guess they still are). But as long as there is no feedback coming from HashiCorp I will not wasting any more energy here to keeping PRs uptodate/resolving conflicts with main branch.

Any updates on this one?
Anyone from Hashicorp?

I am interested in this functionality as well. Updates?

I've just updated my PR's so they are green again. It'd be nice to have some feedback from @bflad or other TF committers. The PR's have been opened for about 10 months now

@bflad Is there any way to get these PR's approved and merged?

Also adding my voice to this one. Interested in getting this merged

Hello! I'm sorry for the delay and I'm actively looking at getting SageMaker support merged into the aws provider.

I've just finished my review of the instance and training job resources and will be moving down the list noted here shortly.

awesome @mbfrahry ! I'll try to go through you review today and make the necessary changes

Thanks @mbfrahry! I'll try find to time on the weekend/next week to go through your reviews.

Hi! We're using Terraform v0.11.3. Do we need to update it to last Terraform version in order to use this new module (when merged) or it will get sync'd during terraform init (keeping the 0.11.3)?

Hey @reynico, you won't have to do anything with modifying the version of Terraform. If you're pinned to a specific version of the aws provider, then you'll have to modify that to get these changes when they're released.

Any estimate at all when this will be released?

Hello everyone. Unfortunately I've not been able to do the modifications as per @mbfrahry 's review and right now my access to AWS is limited and won't be able to test the code with the changes.

If anyone is willing to pick the first two bullets in the description up that'd be great, the PR is #2999 Perhaps @jckuester ? Whoever takes this on can squash my commits or start from scratch if you prefer.

Hey @ddcprg, sorry to hear about your troubles. I'll take this on with @tracypholmes if it hasn't been picked up yet

Sure @mbfrahry pick it up, apologies for the inconveniences

The new aws_sagemaker_notebook_instance resource has been released in version 1.56.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

That is a great milestone! I am sorry to be "that person", but I have been more interested in the 3 outstanding deployment and hosting PRs: 2477, 2478, and 2479. Is there a current ETA on those now too?

@Erstwild we are very close :) I addressed all comments on the sagemaker PRs (which are in the 2nd feedback round now) and am now waiting for approval.

@julesjcraske
Is there any plan for implementing the ability of disabling the Direct internet access
And what about of attaching Lifecycle configurations ?

Many thanks

I would say the following are required in a followup pr.

  • LifecycleConfigName
  • DirectInternetAccess
  • DefaultCodeRepository
  • AdditionalCodeRepositories
  • VolumeSizeInGB (?)
  • AcceleratorTypes (?)

Hi there,
Do we have any specific roadmap or timelines by which we are certain about PRs: 2477, 2478, and 2479? Along with attaching Lifecycle configurations ?

The new aws_sagemaker_model resource has been released in version 1.58.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

Added a new PR for the aws_sagemaker_lifecycle_config resource (https://github.com/terraform-providers/terraform-provider-aws/pull/7585). I also have updates for the aws_sagemaker_notebook_instance in the pipeline (@saritajoshi9389, @bcatubig), as we need those resources in our company @yoyolabsio.

Added a PR for DirectInternetAccess support #7884 . Would love some feedback to see if things could be improved.

Here is another PR (https://github.com/terraform-providers/terraform-provider-aws/pull/8011) that adds all missing attributes to aws_sagemaker_instance, ie. adapts the resource to the latest AWS API.

The aws_sagemaker_endpoint_configuration resource has been released in version 2.4.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

It would be really great to have the notebook lifecycle implemented. This is crucial for one of my clients.

Hi folks 👋 The following will be releasing in version 2.5.0 of the Terraform AWS Provider shortly:

  • New Resource: aws_sagemaker_endpoint
  • New Resource: aws_sagemaker_notebook_instance_lifecycle_configuration
  • resource/aws_sagemaker_notebook_instance: Add lifecycle_config_name argument

@bflad Is this released?

@rverma-nikiai The resources available for the AWS provider can be found at https://www.terraform.io/docs/providers/aws/ - if you type 'sagemaker' in the search box at the top left the resources mentioned above (and the new argument) have indeed been released.

Created a new PR for DirectInternetAccess support as I nuked my previous PR by mistake. The new PR is https://github.com/terraform-providers/terraform-provider-aws/pull/8618

Is it possible to change the timeout for creating the sagemaker endpoint? For me it stops after 1m saying

ResourceNotReady: failed waiting for successful resource state

Usually creating the endpoint I have takes around 15 minutes. Adding
timeouts { create = "30m" delete = "30m" }
like it is possible on other resources gives me:

Error decoding timeout: Timeout Key (create) is not supported

Hi @LarsNeR 👋 My recommendation would be to open submit new GitHub issues following the bug report and feature request templates for existing functionality. These large "support service X" GitHub issues tend to not have a clear definition of done and usually get closed out in preference of more targeted issues for tracking individual asks.

I've just seen it was an error on my site. Works as expected now.

Hi folks 👋

We just merged direct_internet_access argument support into the aws_sagemaker_notebook_instance resource. It will release with version 2.47.0 of the Terraform AWS Provider, tomorrow. Thanks to @bcatubig for the implementation. 👍

Please note that I'm going to close this GitHub issue since the majority of the original work for it has been completed and since there is not a clear definition of done over time. If you have specific bug reports with existing functionality or feature requests for additional functionality please use those issue templates. Thanks.

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

Was this page helpful?
0 / 5 - 0 ratings