Parameterize your Databricks notebooks with widgets

Widgets provide a way to parameterize notebooks in Databricks. If you need to call the same process for different values, you can create widgets to allow you to pass the variable values into the notebook, making your notebook code more reusable. You can then refer to those values throughout the notebook.

Note: There are also ipywidgets, which are visual elements that allow users to specify parameter values in notebook cells. These are different and have a different purpose. Databricks widgets are used to pass values between notebooks or to a job. ipywidgets do not support this functionality.

There are 4 types of Databricks widgets:

  • text
  • dropdowns
  • combobox
  • multiselect

You can create widgets in a notebook using Python, Scala, R, and SQL. I typically use Python or SQL, so my examples will be constrained to those two languages.

For all widgets, you can specify a name, default value, and label. For widgets other than the text type, you can specify the list of allowed values.

Widget Creation

Let’s look at widget creation.

%python
# Make some widgets in Python
dbutils.widgets.text("ProcessName", "Process A", "Name of the calling process")
dbutils.widgets.dropdown("State", "Colorado", ["Colorado", "Missouri", "Kansas", "Nebraska", "Texas"])

The above Python code makes the widgets shown below. The label argument used in the ProcessName widget is optional.

Databricks notebook with two widgets at the top. One is a textbox with the label "Name of the calling process". The other is a dropdown box named "State" with 5 values and a default value set.

Here are some examples of widgets created in SQL.

%sql
Use catalog devcatalog1;
Use schema myschema;
CREATE WIDGET TEXT Datalake DEFAULT "";
CREATE WIDGET COMBOBOX Product DEFAULT 'Product A' CHOICES SELECT DISTINCT ProductName FROM Products LIMIT 3

The above SQL code makes the widgets shown below. Notice that I provided an empty string for the default in the Datalake text widget. In the combobox widget, I used a query to populate the default values.

Databricks notebook with two widgets at the top. One is a textbox with the name "Datalake". The other is a combo box named "Product" with 3 values shown and the user typing in the box.

When you first see the Product widget “Product A” is selected, but as soon as you click on it the default is cleared and you can type in whatever you want. The items shown will filter based upon what you type so you can choose one of them or submit your own value.

Once your widgets are created, you can delete/comment out the code you used to create them and the widgets will remain in place.

Configuring Widgets in Your Notebook

In the widgets panel in your notebook, there are some configurations you can change. You can click the Edit button and drag the corner of any widget to change the width. You can select the widget label to move the widget to a different position, which changes the order of the widgets. By default they are shown alphabetically by label (which defaults to name if no label is provided).

Changing the width and order of widgets in a Databricks notebook. The selected widget shows up with a pink highlight.

When you click the settings button, you can adjust what happens when you change a widget value. There are three options:

  • Run Notebook: Every time a new value is selected, the entire notebook is rerun.
  • Run Accessed Commands: Every time a new value is selected, only cells that retrieve the values for that particular widget are rerun. This is the default setting when you create a widget. SQL cells are not rerun in this configuration.
  • Do Nothing: Every time a new value is selected, nothing is rerun.

I tend to set my widgets to do nothing, especially when I am first building and testing them.

The settings button also allows you to select whether to keep your widgets visible at the top of the notebook, or allow them to stay in the position on the page, becoming hidden as you scroll further down.

Referencing Widgets

Once you have widgets, you can get the current value as shown below.

%python
# Get widget value Python
stateval = dbutils.widgets.get("State")
processval = dbutils.widgets.get("ProcessName")
print("State: " + stateval + " Process: " + processval)
%sql
-- Get widget value SQL
Select "${State}" as State, "${ProcessName}" as Process

In one project I have worked on, we have the same objects replicated across multiple schemas in Unity Catalog, and there is basically a schema per country. All of the countries are in one catalog, and the catalogs represent environments (e.g., dev, test, prod).

So we have a notebook with widgets that is run once for each country/schema. After retrieving the widget values and storing them in variables, we use them to set the catalog and schema and tell the notebook where to create a staging table in Unity Catalog.

%python
# Set the current schema.
catalogstr = "USE CATALOG " + catalogfilterv + ";"
spark.sql(catalogstr)
schemastring = "Use schema " + schemafilterv
#print(schemastring)
spark.sql(schemastring)
# Write to staging table
finaldf.write.saveAsTable("stage_metricsreport")

In another data engineering notebook that gets called once per country, we are parsing a bunch of JSON files stored in a particular folder. We pass that folder location as a parameter to the notebook, facilitated by a widget named “JSONFolder”.

%sql
with base as (select explode(element_at(results.tables.rows[0],1 ) ) r
from json.`${JsonFolder}`
)
Select * from base

Widget Tips

Here are a few things to keep in mind as you use Databricks widgets:

  • You can create a widget in Python and reference it in SQL.
  • You can pass values to a widget when calling a notebook programmatically.
  • You need the Can Manage permission for a notebook in order to configure the widget layout.
  • The setting to pin the widget panel is maintained per user, not just per notebook.

How do you use Databricks widgets?

If you have implemented or seen a creative or really effective use of widgets in Databricks notebooks, feel free to share in the comments.

Go forth and parameterize!

Share

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trust DCAC with your data

Your data systems may be treading water today, but are they prepared for the next phase of your business growth?