Amazon Sagemaker Tutorials

March 22, 2018

In this Amazon SageMaker Tutorial post, we will look at what Amazon Sagemaker is? And use it to build machine learning pipelines. We will be looking at using prebuilt algorithm and writing our own algorithm to build machine models which we can then use for prediction.

Training a Job through Highlevel sagemaker client

Using Amazon Machine Learning Algorithms

Amazon Sagemaker provides a set of algorithms like KMeans, LDA, XGboost, Tensorflow, MXNet which can be used directly if we can convert our data into the format which Sagemaker algorithms use(recordio-protobuf, csv or libsvm)

import pickle, gzip, numpy, urllib.request, json

# Load the dataset
urllib.request.urlretrieve("http://deeplearning.net/data/mnist/mnist.pkl.gz", "mnist.pkl.gz")
with gzip.open('mnist.pkl.gz', 'rb') as f:
train_set, valid_set, test_set = pickle.load(f, encoding='latin1')

from sagemaker import KMeans
role = sagemaker-ARNRole

data_location = 's3://{}/sagemaker/kmeans_highlevel_example/data'.format(bucket)
output_location = 's3://{}/sagemaker/kmeans_example/output'.format(bucket)

kmeans = KMeans(role=role, train_instance_count=2, train_instance_type='ml.c4.8xlarge',
output_path=output_location, k=10, data_location=data_location)

# You can even pass the datalocation directly
kmeans.fit(kmeans.record_set(train_set[0]))

At this point you should have a model in output_location that can be used for deploying the endpoints.
Deploying

kmeans_predictor = kmeans.deploy(
  (initial_instance_count = 1),
  (instance_type = "ml.m4.xlarge")
);

Validation

result = kmeans_predictor.predict(train_set[0][30:31])
print(result)

You could do single prediction or batch prediction using the model.

Train a Model on Amazon SageMaker Using TensorFlow Custom Code(Build your own model)

For building your own estimator we can pass an entry point file which should have the train and serve methods that will be called by sagemaker during fit and serving(host) part. custom_code_upload_location as the name suggests is a place for our custom code in s3 that will be used for training and serving.

The high-level Python library provides the TensorFlow class, which has two methods: fit (for training a model) and deploy (for deploying a model).

In [7]: !ls
source_dir ( This is the source dir which has files)

In [8]: !ls source_dir/
iris_dnn_classifier.py some_other_file.py

iris_estimator = TensorFlow(entry_point='iris_dnn_classifier.py', source_dir='source_dir', role=role,
output_path=model_artifacts_location, code_location=custom_code_upload_location, train_instance_count=1,
train_instance_type='ml.c4.xlarge', training_steps=1000, evaluation_steps=100)

Some of these constructor parameters are sent in the fit method call for model training in the next step.

Details:

entry_point — The example uses only one source file (iris_dnn_classifier.py). If your custom training code is stored in a single file, specify only the entry_point parameter. If it's stored in multiple files, also add the source_dir parameter.
Specify only the source file that contains your custom code. The sagemaker.tensorflow.TensorFlow object determines which Docker image to use for model training when you call the fit method in the next step.
output_path -Identifies the S3 location where you want to save the result of model training (model artifacts).
code_location — S3 location where you want the fit method(in the next step) to upload the tar archive of your custom TensorFlow code.
role - Identifies the IAM role that AmazonSageMaker assumes when performing tasks on your behalf, such as downloading training data from an S3 bucket for model training and uploading training results to an S3 bucket.
hyperparameters - Any hyper parameters that you specify to influence the final quality of the model. Your custom training code uses these parameters.