介绍 (Introduction)

This article will teach you how to install PolyBase and will show you a simple example to start.

本文将教您如何安装PolyBase,并向您展示一个简单的示例。

PolyBase is a new feature in SQL Server 2016. It is used to query relational and non-relational databases (NoSQL). You can use PolyBase to query tables and files in Hadoop or in Azure Blob Storage. You can also import or export data to/from Hadoop.

PolyBase是SQL Server 2016中的新功能。它用于查询关系数据库和非关系数据库(NoSQL)。 您可以使用PolyBase查询Hadoop或Azure Blob存储中的表和文件。 您还可以向Hadoop导入数据或从Hadoop导出数据。

In this example, we will show how to query a CSV file stored in Azure Blob storage from SQL Server 2016 using PolyBase.

在此示例中,我们将展示如何使用PolyBase从SQL Server 2016查询存储在Azure Blob存储中的CSV文件。

要求 (Requirements)

  1. A subscription to Azure Portal.

    订阅Azure门户。
  2. SQL Server installed.

    已安装SQL Server。
  3. MASE installed. MASE 。
  4. SQL Server Management Studio (SSMS) installed.

    已安装SQL Server Management Studio(SSMS)。

入门 (Getting Started)

We will first show how to install PolyBase and next, we will show an example in Azure to query a CSV file from SSMS 2016.

我们将首先展示如何安装PolyBase,接下来,我们将展示Azure中从SSMS 2016查询CSV文件的示例。

安装 (Installation)

PolyBase is included in the SQL Server 2016 installer. It is a new feature. When you install it or when you want to add this feature, you need to select the New SQL Server stand-alone installation or add feature to an existing installation:

PolyBase包含在SQL Server 2016安装程序中。 这是一个新功能。 在安装时或想要添加此功能时,需要选择“ 新建SQL Server独立安装”或将功能添加到现有安装中:



Run the installer until the Feature Selection and make sure that PolyBase Query Service for External Data is selected:

运行安装程序,直到选择“功能”,并确保已选择“用于外部数据的PolyBase查询服务”:



If it was not installed before, the setup will ask you the Oracle Java Runtime Environment 7. JR7 or later is required, because PolyBase uses Java to connect to Hadoop:

如果之前未安装过,安装程序将询问您Oracle Java Runtime Environment7。由于PolyBase使用Java连接到Hadoop,因此需要JR7或更高版本:



You can download JRE in the Oracle JRE page:
Server JRE (Java SE Runtime Environment) 8 Downloads

您可以在Oracle JRE页面中下载JRE:
Server JRE(Java SE运行时环境)8下载

Download the Windows version for your machine with SQL Server:

使用SQL Server为您的计算机下载Windows版本:


Finally, to enable Polybase, use the following configuration sentence:

最后,要启用Polybase,请使用以下配置语句:

 
USE master;  
GO  
EXEC sp_configure 'show advanced option', '1';  
RECONFIGURE; 
GO
EXEC sp_configure 'hadoop connectivity', 4; 
GO 
RECONFIGURE; 
GO
 

Show advanced option in 1 is used to enable advanced option and hadoop connectivity equal to 4, is used to connect to Hadoop in Windows Servers.

显示1中的高级选项用于启用高级选项,Hadoop连接等于4,用于连接Windows Server中的Hadoop。

在Azure中查询CSV文件 (Querying a csv file in Azure)

In this example, we are going to query a csv file that we will upload to Azure. We will query the csv file in our local SQL Server using PolyBase.

在此示例中,我们将查询一个要上传到Azure的csv文件。 我们将使用PolyBase在本地SQL Server中查询csv文件。

We created a file named customers.csv with the following content:

我们创建了一个名为customers.csv的文件,其内容如下:

Name,Lastname,email
john,Rambo,jrambo@hotmail
john,connor,jconnor@hotmail
elvis,presley,epresley@hotmail
elmer,hermosa,ehermosa@gmail

姓名,姓氏,电子邮件
约翰,兰博,jrambo @ hotmail
约翰,康纳,jconnor @ hotmail
猫王,普雷斯利,epresley @ hotmail
elmer,hermosa,ehermosa @ gmail

It is just a file with the customers’ name, last name and email.

它只是一个包含客户名称,姓氏和电子邮件的文件。

If you do not have a Storage account in Azure, go to the Azure Portal and add a new storage account. In this example, the storage account name is polybasestoragesqlshack:

如果您在Azure中没有存储帐户,请转到Azure门户并添加一个新的存储帐户。 在此示例中,存储帐户名称为polybasestoragesqlshack:



If you need more information about creating storage accounts in Azure, please read our article about Storage accounts.

如果您需要有关在Azure中创建存储帐户的更多信息,请阅读有关存储帐户的文章 。

Connect to your Azure Subscription in MASE and right click on the Azure Account to create a container named mycontainer or any name of your preference:

在MASE中连接到您的Azure订阅,然后右键单击Azure帐户以创建一个名为mycontainer或您的首选项的名称的容器:



If you need more explanations about Blob, containers and uploading files in Azure in MASE. Please read our article about uploading files to the Blob storage.

如果您需要有关Blob的更多说明,请在MASE中的Azure中使用容器和上传文件。 请阅读有关将文件上传到Blob存储的文章 。

In the container created, upload the customers.csv file:

在创建的容器中,上传customers.csv文件:



Once that you have the file in Azure, in your local machine, open SSMS and create a new T-SQL sentence in a database:

在Azure中拥有文件后,请在本地计算机中打开SSMS并在数据库中创建新的T-SQL语句:

 
CREATE MASTER KEY ENCRYPTION BY PASSWORD='MyP@ssword123secretword';
 

A master key is a symmetric key used to protect private keys.

主密钥是用于保护私钥的对称密钥。

The next step is to create a credential. In this example, the database credential is named mycredential, but you can use any name. The identity is credential and it can also be any name. Secret is the key to access to the Azure Storage Account:

下一步是创建证书。 在此示例中,数据库凭证名为mycredential,但是您可以使用任何名称。 身份是凭证,也可以是任何名称。 秘密是访问Azure存储帐户的关键:

 
CREATE DATABASE SCOPED CREDENTIAL mycredential  
WITH IDENTITY = 'credential', Secret = 'VgqTsEqg1beXqhb+mO/wZh9OUZ+ByhKJJj7qc9pBne9e+BsdFo4Mfdl+u8Gh94tDqCR0/uNZ4KHr0r4WuK85lA==' 
 

You can get the Secret key that can be the primary or secondary key in the Azure Account. You can copy these keys from the MASE:

你可以得到的秘密密钥可以在Azure的帐户的主要次要钥匙 。 您可以从MASE复制以下密钥:


We will then create external data. This will connect to our Azure Storage Account:

然后,我们将创建外部数据。 这将连接到我们的Azure存储帐户:

 
CREATE EXTERNAL DATA SOURCE mycustomers
WITH (
    TYPE = HADOOP,
    LOCATION = 'wasbs://mycontainer@polybasestoragesqlshack.blob.core.windows/',
    CREDENTIAL = mycredential
);
 

Mycustomers is the name of the datasource and can be any name.

Mycustomers是数据源的名称,可以是任何名称。

The type of the external data is Hadoop. Location is the location of the file stored. Mycontainer is the name of the container created in figure 5 and polybasesstoragesqlshack is the name of the Azure Storage Account created on step 4. Blob.core.windows is part of the address of the container that can be retrieved in MASE:

外部数据的类型是Hadoop。 位置是存储文件的位置。 Mycontainer是在图5中创建的容器的名称,而polybasesstoragesqlshack是在步骤4中创建的Azure存储帐户的名称。Blob.core.windows是可以在MASE中检索的容器地址的一部分:


A typical error when you try to create external data is the following:

尝试创建外部数据时的典型错误如下:

OLE DB provider “SQLNCLI11” for linked server “(null)” returned message “Login timeout expired”.
OLE DB provider “SQLNCLI11” for linked server “(null)” returned message “A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.”.
Msg 10061, Level 16, State 1, Line 9
TCP Provider: No connection could be made because the target machine actively refused it.

链接服务器“(null)”的OLE DB提供程序“ SQLNCLI11”返回消息“登录超时已过期”。
链接服务器“(null)”的OLE DB提供程序“ SQLNCLI11”返回消息“建立与SQL Server的连接时发生了与网络相关或特定于实例的错误。 找不到服务器或无法访问服务器。 检查实例名称是否正确,以及是否将SQL Server配置为允许远程连接。 有关更多信息,请参见SQL Server联机丛书。”。
Msg 10061,第16级,状态1,第9行
TCP提供程序:由于目标计算机主动拒绝连接,因此无法建立连接。

This error occurs because the SQL Server PolyBase Engine Service is down. Make sure that the PolyBase Services are running in SQL Server Configuration Manager or any tool of your preference to handle Windows Services:

发生此错误的原因是SQL Server PolyBase Engine服务已关闭。 确保PolyBase Services正在SQL Server配置管理器中运行,或确保您可以使用任何首选工具来处理Windows Services:



In SSMS, you will be able to see the external data source created in Database>External Resources>External Data Sources:

在SSMS中,您将能够看到在Database> External Resources> External Data Sources中创建的外部数据源:


We have access to external data. In this case, the customer.csv file. We need to specify the format of the file. We will create a format for the external file:

我们可以访问外部数据。 在这种情况下,customer.csv文件。 我们需要指定文件的格式。 我们将为外部文件创建一种格式:

 
CREATE EXTERNAL FILE FORMAT csvformat 
WITH ( 
    FORMAT_TYPE = DELIMITEDTEXT, 
    FORMAT_OPTIONS ( 
        FIELD_TERMINATOR = ','
    ) 
);
 

It is a CSV file, and then the delimiter is a comma. If the external file format is OK you will be displayed in SSMS, in Database>External Resources>External File formats:

它是一个CSV文件,然后定界符是一个逗号。 如果外部文件格式正确,则将以SSMS的形式显示在数据库>外部资源>外部文件格式中:


We will create an external table to query the csv file like a SQL table:

我们将创建一个外部表来查询csv文件,就像SQL表一样:

 
CREATE EXTERNAL TABLE customerstable
( 
    	name VARCHAR(128),
    	lastname VARCHAR(128),
email VARCHAR(100)
) 
WITH 
( 
    LOCATION = '/', 
    DATA_SOURCE = mycustomers, 
    FILE_FORMAT = csvformat
 
)
 

We use the sentence create external table specified a name of our preference. We define the customers of the file and the data type like any table in SQL Server.

我们使用这句话创建外部表,指定我们的偏好名称。 我们定义文件和数据类型的客户,就像SQL Server中的任何表一样。

LOCATION is the location of the csv file. If the customer.csv file were inside a folder named country and then inside a folder named city, the location would be LOCATION=/country/city. In this example, the file is on the root because it is not inside any folder in the container.

LOCATION是csv文件的位置。 如果customer.csv文件位于名为country的文件夹中,然后位于名为city的文件夹中,则位置为LOCATION = / country / city。 在此示例中,该文件位于根目录中,因为它不在容器中的任何文件夹内。

DATA_SOURCE is the data source created in figure 10 and FILE_FORMAT is the format created on figure 11.

DATA_SOURCE是在图10中创建的数据源,而FILE_FORMAT是在图11中创建的格式。

If everything is OK, you will be able to see the external table created in SSMS in Database>Tables>External tables:

如果一切正常,您将能够在数据库>表>外部表中查看在SSMS中创建的外部表:


Now you can query the csv file like any table using T-SQL:

现在,您可以使用T-SQL像查询任何表格一样查询csv文件:

 
SELECT [name]
      ,[lastname]
      ,[email]
  FROM [AdventureWorks2016CTP3].[dbo].[customerstable]
 

You will be able to see your data as if it were a SQL Server Table:

您将能够像查看SQL Server表一样查看数据:

结论 (Conclusion)

In this article, we explained how to install PolyBase and how to query a CSV file stored in an Azure Storage account.

在本文中,我们解释了如何安装PolyBase以及如何查询存储在Azure存储帐户中的CSV文件。

翻译自: https://www.sqlshack/sql-server-2016-polybase-tutorial/

更多推荐

SQL Server 2016 – PolyBase教程