Connecting to Databases
Superset does not ship bundled with connectivity to databases. The main step in connecting Superset to a database is to install the proper database driver(s) in your environment.
You’ll need to install the required packages for the database you want to use as your metadata database as well as the packages needed to connect to the databases you want to access through Superset. For information about setting up Superset's metadata database, please refer to installation documentations (Docker Compose, Kubernetes)
This documentation tries to keep pointer to the different drivers for commonly used database engine.
Installing Database Drivers
Superset requires a Python DB-API database driver and a SQLAlchemy dialect to be installed for each database engine you want to connect to.
You can read more here about how to install new database drivers into your Superset configuration.
Supported Databases and Dependencies
Some of the recommended packages are shown below. Please refer to pyproject.toml for the versions that are compatible with Superset.
Database | PyPI package | Connection String |
|---|---|---|
| AWS Athena | pip install pyathena[pandas] , pip install PyAthenaJDBC | awsathena+rest://{access_key_id}:{access_key}@athena.{region}.amazonaws.com/{schema}?s3_staging_dir={s3_staging_dir}&... |
| AWS DynamoDB | pip install pydynamodb | dynamodb://{access_key_id}:{secret_access_key}@dynamodb.{region_name}.amazonaws.com?connector=superset |
| AWS Redshift | pip install sqlalchemy-redshift | redshift+psycopg2://<userName>:<DBPassword>@<AWS End Point>:5439/<Database Name> |
| Apache Doris | pip install pydoris | doris://<User>:<Password>@<Host>:<Port>/<Catalog>.<Database> |
| Apache Drill | pip install sqlalchemy-drill | drill+sadrill:// For JDBC drill+jdbc:// |
| Apache Druid | pip install pydruid | druid://<User>:<password>@<Host>:<Port-default-9088>/druid/v2/sql |
| Apache Hive | pip install pyhive | hive://hive@{hostname}:{port}/{database} |
| Apache Impala | pip install impyla | impala://{hostname}:{port}/{database} |
| Apache Kylin | pip install kylinpy | kylin://<username>:<password>@<hostname>:<port>/<project>?<param1>=<value1>&<param2>=<value2> |
| Apache Pinot | pip install pinotdb | pinot://BROKER:5436/query?server=http://CONTROLLER:5983/ |
| Apache Solr | pip install sqlalchemy-solr | solr://{username}:{password}@{hostname}:{port}/{server_path}/{collection} |
| Apache Spark SQL | pip install pyhive | hive://hive@{hostname}:{port}/{database} |
| Ascend.io | pip install impyla | ascend://{username}:{password}@{hostname}:{port}/{database}?auth_mechanism=PLAIN;use_ssl=true |
| Azure MS SQL | pip install pymssql | mssql+pymssql://UserName@presetSQL:TestPassword@presetSQL.database.windows.net:1433/TestSchema |
| ClickHouse | pip install clickhouse-connect | clickhousedb://{username}:{password}@{hostname}:{port}/{database} |
| CockroachDB | pip install cockroachdb | cockroachdb://root@{hostname}:{port}/{database}?sslmode=disable |
| Couchbase | pip install couchbase-sqlalchemy | couchbase://{username}:{password}@{hostname}:{port}?truststorepath={ssl certificate path} |
| Dremio | pip install sqlalchemy_dremio | dremio://user:pwd@host:31010/ |
| Elasticsearch | pip install elasticsearch-dbapi | elasticsearch+http://{user}:{password}@{host}:9200/ |
| Exasol | pip install sqlalchemy-exasol | exa+pyodbc://{username}:{password}@{hostname}:{port}/my_schema?CONNECTIONLCALL=en_US.UTF-8&driver=EXAODBC |
| Google BigQuery | pip install sqlalchemy-bigquery | bigquery://{project_id} |
| Google Sheets | pip install shillelagh[gsheetsapi] | gsheets:// |
| Firebolt | pip install firebolt-sqlalchemy | firebolt://{client_id}:{client_secret}@{database}/{engine_name}?account_name={name} |
| Hologres | pip install psycopg2 | postgresql+psycopg2://<UserName>:<DBPassword>@<Database Host>/<Database Name> |
| IBM Db2 | pip install ibm_db_sa | db2+ibm_db:// |
| IBM Netezza Performance Server | pip install nzalchemy | netezza+nzpy://<UserName>:<DBPassword>@<Database Host>/<Database Name> |
| MySQL | pip install mysqlclient | mysql://<UserName>:<DBPassword>@<Database Host>/<Database Name> |
| OceanBase | pip install oceanbase_py | oceanbase://<UserName>:<DBPassword>@<Database Host>/<Database Name> |
| Oracle | pip install cx_Oracle | oracle:// |
| PostgreSQL | pip install psycopg2 | postgresql://<UserName>:<DBPassword>@<Database Host>/<Database Name> |
| Presto | pip install pyhive | presto:// |
| Rockset | pip install rockset-sqlalchemy | rockset://<api_key>:@<api_server> |
| SAP Hana | pip install hdbcli sqlalchemy-hana or pip install apache_superset[hana] | hana://{username}:{password}@{host}:{port} |
| StarRocks | pip install starrocks | starrocks://<User>:<Password>@<Host>:<Port>/<Catalog>.<Database> |
| Snowflake | pip install snowflake-sqlalchemy | snowflake://{user}:{password}@{account}.{region}/{database}?role={role}&warehouse={warehouse} |
| SQLite | No additional library needed | sqlite://path/to/file.db?check_same_thread=false |
| SQL Server | pip install pymssql | mssql+pymssql:// |
| Teradata | pip install teradatasqlalchemy | teradatasql://{user}:{password}@{host} |
| TimescaleDB | pip install psycopg2 | postgresql://<UserName>:<DBPassword>@<Database Host>:<Port>/<Database Name> |
| Trino | pip install trino | trino://{username}:{password}@{hostname}:{port}/{catalog} |
| Vertica | pip install sqlalchemy-vertica-python | vertica+vertica_python://<UserName>:<DBPassword>@<Database Host>/<Database Name> |
| YugabyteDB | pip install psycopg2 | postgresql://<UserName>:<DBPassword>@<Database Host>/<Database Name> |
Note that many other databases are supported, the main criteria being the existence of a functional SQLAlchemy dialect and Python driver. Searching for the keyword "sqlalchemy + (database name)" should help get you to the right place.
If your database or data engine isn't on the list but a SQL interface exists, please file an issue on the Superset GitHub repo, so we can work on documenting and supporting it.
If you'd like to build a database connector for Superset integration, read the following tutorial.
Installing Drivers in Docker Images
Superset requires a Python database driver to be installed for each additional type of database you want to connect to.
In this example, we'll walk through how to install the MySQL connector library. The connector library installation process is the same for all additional libraries.
1. Determine the driver you need
Consult the list of database drivers
and find the PyPI package needed to connect to your database. In this example, we're connecting
to a MySQL database, so we'll need the mysqlclient connector library.
2. Install the driver in the container
We need to get the mysqlclient library installed into the Superset docker container
(it doesn't matter if it's installed on the host machine). We could enter the running
container with docker exec -it <container_name> bash and run pip install mysqlclient
there, but that wouldn't persist permanently.
To address this, the Superset docker compose deployment uses the convention
of a requirements-local.txt file. All packages listed in this file will be installed
into the container from PyPI at runtime. This file will be ignored by Git for
the purposes of local development.
Create the file requirements-local.txt in a subdirectory called docker that
exists in the directory with your docker-compose.yml or docker-compose-non-dev.yml file.
# Run from the repo root:
touch ./docker/requirements-local.txt
Add the driver identified in step above. You can use a text editor or do it from the command line like:
echo "mysqlclient" >> ./docker/requirements-local.txt
If you are running a stock (non-customized) Superset image, you are done.
Launch Superset with docker compose -f docker-compose-non-dev.yml up and
the driver should be present.
You can check its presence by entering the running container with
docker exec -it <container_name> bash and running pip freeze. The PyPI package should
be present in the printed list.
If you're running a customized docker image, rebuild your local image with the new driver baked in:
docker compose build --force-rm
After the rebuild of the Docker images is complete, relaunch Superset by
running docker compose up.
3. Connect to MySQL
Now that you've got a MySQL driver installed in your container, you should be able to connect to your database via the Superset web UI.
As an admin user, go to Settings -> Data: Database Connections and click the +DATABASE button. From there, follow the steps on the Using Database Connection UI page.
Consult the page for your specific database type in the Superset documentation to determine the connection string and any other parameters you need to input. For instance, on the MySQL page, we see that the connection string to a local MySQL database differs depending on whether the setup is running on Linux or Mac.
Click the “Test Connection” button, which should result in a popup message saying, "Connection looks good!".
4. Troubleshooting
If the test fails, review your docker logs for error messages. Superset uses SQLAlchemy to connect to databases; to troubleshoot the connection string for your database, you might start Python in the Superset application container or host environment and try to connect directly to the desired database and fetch data. This eliminates Superset for the purposes of isolating the problem.
Repeat this process for each type of database you want Superset to connect to.
Database-specific Instructions
Ascend.io
The recommended connector library to Ascend.io is impyla.
The expected connection string is formatted as follows:
ascend://{username}:{password}@{hostname}:{port}/{database}?auth_mechanism=PLAIN;use_ssl=true
Apache Doris
The sqlalchemy-doris library is the recommended way to connect to Apache Doris through SQLAlchemy.
You'll need the following setting values to form the connection string:
- User: User Name
- Password: Password
- Host: Doris FE Host
- Port: Doris FE port
- Catalog: Catalog Name
- Database: Database Name
Here's what the connection string looks like:
doris://<User>:<Password>@<Host>:<Port>/<Catalog>.<Database>
AWS Athena
PyAthenaJDBC
PyAthenaJDBC is a Python DB 2.0 compliant wrapper for the Amazon Athena JDBC driver.
The connection string for Amazon Athena is as follows:
awsathena+jdbc://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}&...
Note that you'll need to escape & encode when forming the connection string like so:
s3://... -> s3%3A//...
PyAthena
You can also use the PyAthena library (no Java required) with the following connection string:
awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}&...
The PyAthena library also allows to assume a specific IAM role which you can define by adding following parameters in Superset's Athena database connection UI under ADVANCED --> Other --> ENGINE PARAMETERS.
{
"connect_args": {
"role_arn": "<role arn>"
}
}
AWS DynamoDB
PyDynamoDB
PyDynamoDB is a Python DB API 2.0 (PEP 249) client for Amazon DynamoDB.
The connection string for Amazon DynamoDB is as follows:
dynamodb://{aws_access_key_id}:{aws_secret_access_key}@dynamodb.{region_name}.amazonaws.com:443?connector=superset
To get more documentation, please visit: PyDynamoDB WIKI.
AWS Redshift
The sqlalchemy-redshift library is the recommended way to connect to Redshift through SQLAlchemy.
This dialect requires either redshift_connector or psycopg2 to work properly.
You'll need to set the following values to form the connection string:
- User Name: userName
- Password: DBPassword
- Database Host: AWS Endpoint
- Database Name: Database Name
- Port: default 5439
psycopg2
Here's what the SQLALCHEMY URI looks like:
redshift+psycopg2://<userName>:<DBPassword>@<AWS End Point>:5439/<Database Name>
redshift_connector
Here's what the SQLALCHEMY URI looks like:
redshift+redshift_connector://<userName>:<DBPassword>@<AWS End Point>:5439/<Database Name>
Using IAM-based credentials with Redshift cluster:
Amazon redshift cluster also supports generating temporary IAM-based database user credentials.
Your superset app's IAM role should have permissions to call the redshift:GetClusterCredentials operation.
You have to define the following arguments in Superset's redshift database connection UI under ADVANCED --> Others --> ENGINE PARAMETERS.
{"connect_args":{"iam":true,"database":"<database>","cluster_identifier":"<cluster_identifier>","db_user":"<db_user>"}}
and SQLALCHEMY URI should be set to redshift+redshift_connector://