Use natural language to explore and prepare data with a new capability of Amazon SageMaker Canvas

November 30, 2023

Today, I’m happy to introduce the ability to use natural language instructions in Amazon SageMaker Canvas to explore, visualize, and transform data for machine learning (ML).

SageMaker Canvas now supports using foundation model-(FM) powered natural language instructions to complement its comprehensive data preparation capabilities for data exploration, analysis, visualization, and transformation. Using natural language instructions, you can now explore and transform your data to build highly accurate ML models. This new capability is powered by Amazon Bedrock.

Data is the foundation for effective machine learning, and transforming raw data to make it suitable for ML model building and generating predictions is key to better insights. Analyzing, transforming, and preparing data to build ML models is often the most time-consuming part of the ML workflow. With SageMaker Canvas, data preparation for ML is seamless and fast with 300+ built-in transforms, analyses, and an in-depth data quality insights report without writing any code. Starting today, the process of data exploration and preparation is faster and simpler in SageMaker Canvas using natural language instructions for exploring, visualizing, and transforming data.

Data preparation tasks are now accelerated through a natural language experience using queries and responses. You can quickly get started with contextual, guided prompts to understand and explore your data.

Say I want to build an ML model to predict house prices Using SageMaker Canvas. First, I need to prepare my housing dataset to build an accurate model. To get started with the new natural language instructions, I open the SageMaker Canvas application, and in the left navigation pane, I choose Data Wrangler. Under the Data tab and from the list of available datasets, I select the canvas-housing-sample.csv as the dataset, then select Create a data flow and choose Create. I see the tabular view of my dataset and an introduction to the new Chat for data prep capability.


I select Chat for data prep, and it displays the chat interface with a set of guided prompts relevant to my dataset. I can use any of these prompts or query the data for something else.


First, I want to understand the quality of my dataset to identify any outliers or anomalies. I ask SageMaker Canvas to generate a data quality report to accomplish this task.


I see there are no major issues with my data. I would now like to visualize the distribution of a couple of features in the data. I ask SageMaker Canvas to plot a chart.


I now want to filter certain rows to transform my data. I ask SageMaker Canvas to remove rows where the population is less than 1,000. Canvas removes those rows, shows me a preview of the transformed data, and also gives me the option to view and update the code that generated the transform.


I am happy with the preview and add the transformed data to my list of data transform steps on the right. SageMaker Canvas adds the step along with the code.


Now that my data is transformed, I can go on to build my ML model to predict house prices and even deploy the model into production using the same visual interface of SageMaker Canvas, without writing a single line of code.

Data preparation has never been easier for ML!

The new capability in Amazon SageMaker Canvas to explore and transform data using natural language queries is available in all AWS Regions where Amazon SageMaker Canvas and Amazon Bedrock are supported.

Learn more
Amazon SageMaker Canvas product page

Go build!

— Irshad