CREATE EXTERNAL DATA SOURCE (Transact-SQL)
CREATE EXTERNAL DATA SOURCE (Transact-SQL)
SQL)
Article • 09/13/2023
Creates an external data source for querying using SQL Server, Azure SQL Database, Azure SQL Managed Instance,
Azure Synapse Analytics, Analytics Platform System (PDW), or Azure SQL Edge.
This article provides the syntax, arguments, remarks, permissions, and examples for whichever SQL product you
choose.
Select a product
In the following row, select the product name you're interested in, and only that product's information is displayed.
* SQL Server *
SQL Database
SQL Managed
Instance
Azure Synapse
Analytics
Analytics Platform
System (PDW)
Creates an external data source for PolyBase queries. External data sources are used to establish connectivity and
support these primary use cases:
7 Note
This syntax varies in different versions of SQL Server. Use the version selector dropdown to choose the
appropriate version. This content applies to SQL Server 2022 (16.x) and later.
Syntax for SQL Server 2022 and later
For more information about the syntax conventions, see Transact-SQL syntax conventions.
syntaxsql
Arguments
data_source_name
Specifies the user-defined name for the data source. The name must be unique within the database in SQL Server.
LOCATION = '<prefix>://<path[:port]>'
Provides the connectivity protocol and path to the external data source.
ノ Expand table
* Must be a database scoped credential, where the IDENTITY is hard-coded to IDENTITY = 'S3 Access Key' and the
SECRET argument is in the format = '<AccessKeyID>:<SecretKeyID>' or use pass-through (STS) authorization. For
more information, see Configure PolyBase to access external data in S3-compatible object storage.
Location path:
port = The port that the external data source is listening on. Optional in many cases, depending on network
configuration.
<container_name> = the container of the storage account holding the data. Root containers are read-only,
<instance_name> = the name of the SQL Server named instance. Used if you have SQL Server Browser Service
storage platform.
<region> = For S3-compatible object storage only (starting with SQL Server 2022 (16.x)), specific to the
storage platform.
<folder> = Part of the storage path within the storage URL.
The SQL Server Database Engine doesn't verify the existence of the external data source when the object is
created. To validate, create an external table using the external data source.
You can use the sqlserver connector to connect SQL Server 2019 (15.x) to another SQL Server or to Azure
SQL Database.
Specify the Driver={<Name of Driver>} when connecting via ODBC .
The Hierarchical Namespace option for Azure Storage Accounts(V2) using the prefix adls is supported via
Azure Data Lake Storage Gen2 in SQL Server 2022 (16.x).
SQL Server support for HDFS Cloudera (CDP) and Hortonworks (HDP) external data sources are retired and
not included in SQL Server 2022 (16.x). There is no need to use the TYPE argument in SQL Server 2022 (16.x).
For more information on S3-compatible object storage and PolyBase starting with SQL Server 2022 (16.x), see
Configure PolyBase to access external data in S3-compatible object storage. For an example of querying a
parquet file within S3-compatible object storage, see Virtualize parquet file in a S3-compatible object storage
with PolyBase.
Differing from previous versions, in SQL Server 2022 (16.x), the prefix used for Azure Storage Account (v2)
changed from wasb[s] to abs .
Differing from previous versions, in SQL Server 2022 (16.x), the prefix used for Azure Data Lake Storage Gen2
changed from abfs[s] to adls .
For an example using PolyBase to virtualize a CSV file in Azure Storage, see Virtualize CSV file with PolyBase.
For an example using PolyBase to virtualize a delta table in ADLS Gen2, see Virtualize delta table with
PolyBase.
SQL Server 2022 (16.x) fully supports two URL formats for both Azure Storage Account v2 ( abs ) and Azure
Data Lake Gen2 ( adls ).
The LOCATION path can use the formats: <container>@<storage_account_name>.. (recommended) or
<storage_account_name>../<container> . For example:
Applies to generic ODBC connections, as well as built-in ODBC connectors for SQL Server, Oracle, Teradata,
MongoDB, and Azure Cosmos DB API for MongoDB.
The key_value_pair is the keyword and the value for a specific connection option. The available keywords and
values depend on the external data source type. The name of the driver is required as a minimum, but there are
other options such as APP='<your_application_name>' or ApplicationIntent= ReadOnly|ReadWrite that are also
useful to set and can assist with troubleshooting.
Possible key value pairs are specific to the driver. For more information for each provider, see CREATE EXTERNAL
DATA SOURCE (Transact-SQL) CONNECTION_OPTIONS.
Starting in Applies to: SQL Server 2022 (16.x) cumulative update 2, additional keywords were introduced to
support Oracle TNS files:
The keyword TNSNamesFile specifies the filepath to the tnsnames.ora file located on the Oracle server.
The keyword ServerName specifies the alias used inside the tnsnames.ora that will be used to replace the host
name and the port.
PUSHDOWN = ON | OFF
Applies to: SQL Server 2019 (15.x) and later. States whether computation can be pushed down to the external data
source. It is on by default.
PUSHDOWN is supported when connecting to SQL Server, Oracle, Teradata, MongoDB, the Azure Cosmos DB API for
CREDENTIAL = credential_name
Specifies a database-scoped credential for authenticating to the external data source.
CREDENTIAL is only required if the data has been secured. CREDENTIAL isn't required for data sets that allow
anonymous access.
When accessing Azure Storage Account (V2) or Azure Data Lake Storage Gen2, the IDENTITY must be SHARED
ACCESS SIGNATURE .
For an example, see Create an external data source to execute bulk operations and retrieve data from
Azure Storage into SQL Database.
You can create and configure an SAS with Azure Storage Explorer.
You can create an SAS programmatically via PowerShell, Azure CLI, .NET, and REST API. For more information,
see Grant limited access to Azure Storage resources using shared access signatures (SAS).
ノ Expand table
Action Permission
Read data from multiple files and subfolders Read and List
Use Create External Table as Select (CETAS) Read, Create, List and Write
For an example of using a CREDENTIAL with S3-compatible object storage and PolyBase, see Configure PolyBase to
access external data in S3-compatible object storage.
To create a database scoped credential, see CREATE DATABASE SCOPED CREDENTIAL (Transact-SQL).
Permissions
Requires CONTROL permission on database in SQL Server.
Locking
Takes a shared lock on the EXTERNAL DATA SOURCE object.
Security
PolyBase supports proxy based authentication for most external data sources. Create a database scoped credential
to create the proxy account.
Users will also need to configure their external data sources to use new connectors when connecting to Azure
Storage.
ノ Expand table
Examples
) Important
For information on how to install and enable PolyBase, see Install PolyBase on Windows
SQL
-- Create a database master key if one does not already exist, using your own password.
-- This key is used to encrypt the credential secret in next step.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>';
-- Create a database scoped credential with Azure storage account key as the secret.
CREATE DATABASE SCOPED CREDENTIAL OracleProxyAccount
WITH IDENTITY = 'oracle_username',
SECRET = 'oracle_password';
Optionally, the external data source to Oracle can use proxy authentication to provide fine grain access control. A
proxy user can be configured to have limited access compared to the user being impersonated.
SQL
Starting in Applies to: SQL Server 2022 (16.x) Cumulative Update 2, CREATE EXTERNAL DATA SOURCE now
supports the use of TNS files when connecting to Oracle. The CONNECTION_OPTIONS parameter was expanded and
now uses TNSNamesFile and ServerName as variables to browse the tnsnames.ora file and establish connection with
the server.
In the example below, during runtime SQL Server will search for the tnsnames.ora file location specified by
TNSNamesFile and search for the host and network port specified by ServerName .
SQL
To create an external data source that references a named instance of SQL Server, use CONNECTION_OPTIONS to
specify the instance name.
First, create the database scoped credential, storing credentials for a SQL authenticated login. The SQL ODBC
Connector for PolyBase only supports basic authentication. Before you create a database scoped credential, the
database must have a master key to protect the credential. For more information, see CREATE MASTER KEY. The
following sample creates a database scoped credential, provide your own login and password.
SQL
In the following example, WINSQL2019 is the host name and SQL2019 is the instance name. 'Server=%s\SQL2019' is
the key value pair.
SQL
Alternatively, you can use a port to connect to a SQL Server default instance.
SQL
To create an external data source that references a readable secondary replica of SQL Server, use
CONNECTION_OPTIONS to specify the ApplicationIntent=ReadOnly .
First, create the database scoped credential, storing credentials for a SQL authenticated login. The SQL ODBC
Connector for PolyBase only supports basic authentication. Before you create a database scoped credential, the
database must have a master key to protect the credential. For more information, see CREATE MASTER KEY. The
following sample creates a database scoped credential, provide your own login and password.
SQL
The ODBC Database parameter is not needed, provide the database name instead via a three-part name in the
CREATE EXTERNAL TABLE statement, within the LOCATION parameter. For an example, see CREATE EXTERNAL
TABLE.
In the following example, WINSQL2019AGL is the availability group listener name and dbname is the name of the
database to be the target of the CREATE EXTERNAL TABLE statement.
SQL
You can demonstrate the redirection behavior of the availability group by specifying ApplicationIntent and
creating an external table on the system view sys.servers . In the following sample script, two external data
sources are created, and one external table is created for each. Use the views to test which server is responding to
the connection. Similar outcomes can also be achieved via the read-only routing feature. For more information, see
Configure read-only routing for an Always On availability group.
SQL
Inside the database in the availability group, create a view to return sys.servers and the name of the local
instance, which helps you identify which replica is responding to the query. For more information, see sys.servers.
SQL
SQL
SELECT [name]
FROM dbo.vw_sys_servers_ro;--should return secondary replica instance
SELECT [name]
FROM dbo.vw_sys_servers_rw;--should return primary replica instance
GO
The following sample script creates an external data source s3_ds in the source user database in SQL Server. The
external data source references the s3_dc database scoped credential.
SQL
Then, the following example demonstrates using T-SQL to query a parquet file stored in S3-compatible object
storage via OPENROWSET query. For more information, see Virtualize parquet file in a S3-compatible object
storage with PolyBase.
SQL
SELECT *
FROM OPENROWSET (
BULK '/<bucket>/<parquet_folder>',
FORMAT = 'PARQUET',
DATA_SOURCE = 's3_ds'
) AS [cc];
In this example, the generic ODBC data provider is used to connect to a PostgreSQL database server in the same
network, where the fully qualified domain name of the PostgreSQL server is POSTGRES1 , using the default port of
TCP 5432.
SQL
Azure Storage
ノ Expand table
Action Permission
Read data from multiple files and subfolders Read and List
Use Create External Table as Select (CETAS) Read, Create and Write
Starting in SQL Server 2022 (16.x), use a new prefix abs for Azure Storage Account v2. The abs prefix supports
authentication using SHARED ACCESS SIGNATURE . The abs prefix replaces wasb , used in previous versions. HADOOP
is not longer supported, there is no more need to use TYPE = BLOB_STORAGE .
The Azure storage account key is no longer needed, instead using SAS Token as we can see in the following
example:
SQL
-- Create a database master key if one does not already exist, using your own password.
-- This key is used to encrypt the credential secret in next step.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>';
GO
For a more detailed example on how to access CSV files stored in Azure Blob Storage, see Virtualize CSV file with
PolyBase.
G. Create external data source to access data in Azure Data Lake
Gen2
Applies to: SQL Server 2022 (16.x) and later versions
Starting in SQL Server 2022 (16.x), use a new prefix adls for Azure Data Lake Gen2, replacing abfs used in
previous versions. The adls prefix also supports SAS token as authentication method as shown in this example:
SQL
For a more detailed example on how to access delta files stored on Azure Data Lake Gen2, see Virtualize delta
table with PolyBase.
Do not add a trailing /, file name, or shared access signature parameters at the end of the LOCATION URL when
configuring an external data source for bulk operations.
Use the following data source for bulk operations using BULK INSERT or OPENROWSET. The credential must set
SHARED ACCESS SIGNATURE as the identity, mustn't have the leading ? in the SAS token, must have at least read
permission on the file that should be loaded (for example srt=o&sp=r ), and the expiration period should be valid
(all dates are in UTC time). For more information on shared access signatures, see Using Shared Access Signatures
(SAS).
SQL
CREATE DATABASE SCOPED CREDENTIAL AccessAzureInvoices
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
-- Remove ? from the beginning of the SAS token
SECRET = '<azure_shared_access_signature>';
Next steps
ALTER EXTERNAL DATA SOURCE (Transact-SQL)
CREATE DATABASE SCOPED CREDENTIAL (Transact-SQL)
CREATE EXTERNAL FILE FORMAT (Transact-SQL)
CREATE EXTERNAL TABLE (Transact-SQL)
sys.external_data_sources (Transact-SQL)
Using Shared Access Signatures (SAS)
PolyBase Connectivity Configuration
Feedback
Was this page helpful? Yes No