Zero code for AI pipeline by using google Vertex
As I have posted in my last blog, google has two important AI platforms, one is ai-platform, another is Vertex AI.
In the traditional ai-platform, you will have to write code for AI models, train and deploy them by manual. ai-platform is a PasS service provided by google. But with Vertex AI, you can get model design and training auto-finished via AutoML. You have just some clicks on webpage to make jobs done. Vertex is a typical SaaS though google says it's the PaaS.
Start training AI job by using GCP ai platform from scratch
By using Vertex AI, you just write zero code for implementation of AI pipeline, including training, deployment, evaluation and prediction. I will show you how to do it by sample in this article.
First we will have a dataset prepared for training.
It's a CSV file with the data structure as follows (reported by Apache Spark).
scala> df.count
res2: Long = 32950
scala> df.printSchema
root
|-- age: integer (nullable = true)
|-- job: string (nullable = true)
|-- marital: string (nullable = true)
|-- education: string (nullable = true)
|-- default: string (nullable = true)
|-- housing: string (nullable = true)
|-- loan: string (nullable = true)
|-- contact: string (nullable = true)
|-- month: string (nullable = true)
|-- dayofweek: string (nullable = true)
|-- duration: integer (nullable = true)
|-- campaign: integer (nullable = true)
|-- pdays: integer (nullable = true)
|-- previous: integer (nullable = true)
|-- poutcome: string (nullable = true)
|-- y: string (nullable = true)
This dataset has 32950 rows. It's a loan data where each row is lender's features. The last column is label (yes/no). The data looks like follows.
37,entrepreneur,married,university.degree,no,no,no,telephone,nov,wed,202,2,999,1,failure,no
We put the dataset into google cloud storage where Vertex will choose the data for training.
$ gsutil cp new_train.csv gs://mljobs
Let's go to Vertex AI web console,
We will implement most of our jobs from here.
Dataset
Go to "Datasets" tab on the left, click "Create dataset" button to create a dataset.
You should choose the "Tabular" as dataset type.
Next, choose files from cloud storage for which you have to specify the correct path.
Click "continue", you have dataset prepared now.
Training
From your dataset page, at the top right, click "Train new mode" to setup a training job.
We are running a classification model by using AutoML.
In the second section, choose "Target column" to "y" in dataset.
In the last section, input a value for "node hours", then we begin the training process by clicking "Start training".
Now go to "Training" tab on the left, you will see the training jobs.
The training process takes long time, even though this is a small dataset. Please wait and take a coffee now.
Update: For this sample training job, it takes 5 hr 11 min to finish. As I have said, AutoML is expensive.
Evaluation
Before seeing evaluation reports provided by Vertex, you should know some basic concepts for them.
Accuracy, Precision, Recall & F1 Score
Go to "Model registry" tab on the left, where you will see the models and versions.
Click the model you just trained, and select the right version, you will see the info for evaluation.
Here are the evaluation results for this version.
- PR AUC 0.979
- ROC AUC 0.978
- Log loss 0.188
- F1 score 0.9127329
- Precision 91.3%
- Recall 91.3%
As you see it has good performance for the classification job. All results including AUC, Loss, F1, Precision, Recall get good scores.
The PR curve and ROC curve are presented as well.
Though this is the first version, but AutoML does have good performance. We can deploy it to production now.
Deployment
In your model page, click "Deploy & Test" on the top, then click "Deploy to endpoint" to deploy the model to an endpoint.
You have to setup some options for deployment, the most important one is server configuration. For example, how many instances and how large specs you will use to run prediction.
After deployment, go to "Endpoints" tab on the left, you will see the deployed models.
Click the one you just deployed, in next page click "Sample request" on the top, where you can see how to request the prediction service.
Prediction
Models are deployed on GCP endpoints. After deployment it will provide the service endpoint for client access. You have to write your own client for requesting the prediction.
Before doing that you have to make client libraries on your PC get authorized by GCP. Run the following command.
$ gcloud auth application-default login
Next, create a sample JSON file named "INPUT.JSON" which will be used to request the prediction service.
{
"instances": [
{"age":"37", "job":"entrepreneur", "marital":"married", "education":"university.degree", "default":"no", "housing":"no", "loan":"no", "contact":"telephone", "month":"nov", "dayofweek":"wed", "duration":"202", "campaign":"2", "pdays":"999", "previous":"1", "poutcome":"failure" }
]
}
Next, write the following bash script whose content is copied from "Sample request" page above.
#!/bin/bash
ENDPOINT_ID="1875555730753323008"
PROJECT_ID="891126174042"
INPUT_DATA_FILE="INPUT.JSON"
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-west1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-west1/endpoints/${ENDPOINT_ID}:predict \
-d "@${INPUT_DATA_FILE}"
Finally, run the bash script and you will see the prediction results.
{
"predictions": [
{
"classes": [
"no",
"yes"
],
"scores": [
0.95504599809646606,
0.044953998178243637
]
}
],
"deployedModelId": "7759191984564076544",
"model": "projects/891126174042/locations/us-west1/models/3530769331300335616",
"modelDisplayName": "untitled_1671499721692",
"modelVersionId": "1"
}
Now you have got all jobs done well on Vertex AI platform. And you can see that, with the helps of AutoML, you don't need to write code for models. Just provide dataset and training purpose Vertex AI will make the best model for you.