APC 技術ブログ

株式会社エーピーコミュニケーションズの技術ブログです。

株式会社 エーピーコミュニケーションズの技術ブログです。

Introduction to Fivetran(3) - Link Fivetran and Databricks and import data from AWS RDS

1.Introduction

This is Matsuzaki from the lakehouse department of the glb business department. This time, I will explain how to create an ELT setting by linking Fivetran with Databricks. In this article, we will create a connection setting from Amazon RDS (MySQL) using the MySQL RDS connector.

※Click here for past articles.

table of contents

2. References

3. Prerequisites

  • A Databricks workspace is a prerequisite. Our company Mr. Abe explains how to create a workspace in the following article, so please take a look.

  • It is a prerequisite that you have already created a Fivetran account and linked with Databricks. If you have not created a Fivetran account, please refer to the previous article.

4. Preliminary work: prepare RDS instance for verification

4-1. Create RDS instance

First, create an Amazon RDS instance for verification on the Amazon console.

  • MariaDB Select the type of DB engine.
  • Dev/Test Select the DB instance size.

4-2. Create TLS settings

After creating and launching the RDS instance, publicly accessible enable the TLS setting by enabling it in the AWS management console.*1

After completing the setting change, connect to RDS with mysql-client and confirm that the TLS setting is enabled.

mysql> show global variables like '%ssl%'

+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name       | Value                                                                                                                                                                                                                                                                                   |
+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| have_openssl        | YES                                                                                                                                                                                                                                                                                     |
| have_ssl            | YES                                                                                                                                                                                                                                                                                     |
| ssl_ca              | /rdsdbdata/rds-metadata/ca-cert.pem                                                                                                                                                                                                                                                     |
| ssl_capath          |                                                                                                                                                                                                                                                                                         |
| ssl_cert            | /rdsdbdata/rds-metadata/server-cert.pem                                                                                                                                                                                                                                                 |
| ssl_cipher          | ECDHE-RSA-xxx
| ssl_crl             |                                                                                                                                                                                                                                                                                         |
| ssl_crlpath         |                                                                                                                                                                                                                                                                                         |
| ssl_key             | /rdsdbdata/rds-metadata/server-key.pem                                                                                                                                                                                                                                                  |
| version_ssl_library | OpenSSL 1.1.1q  5 Jul 2022                                                                                                                                                                                                                                                              |
+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
10 rows in set (0.19 sec)        
mysql> status
--------------
mysql  Ver 8.0.32-0ubuntu0.22.04.2 for Linux on x86_64 ((Ubuntu))
SSL:                    Cipher in use is TLS_AES_256_GCM_SHA384
SSL session reused:     true
--------------

4-3. Preparation of sample data

Finally, prepare the test data to be imported into Fivetran. For this verification, we used the official MySQL world database according to the following procedure, but the content of the data does not matter here

5. Create ELT settings

5-1. Creating ELT settings and entering setting values

From the Fivetran admin screen, Connectors go to and Add connector click on the top right of the screen.

The screen will change to the connector selection screen, so MySQL RDS select the connector.

You will be taken to the setting value input screen for ELT, so enter/select various setting values.

The meaning of the setting values ​​is summarized below.

  • Destination schema :This field specifies the prefix given to the schema name when copying the schema (database) from RDS to Databricks. Here mysql_rds we used the default value.
  • Host : A field that specifies the FQDN of the MySQL Server. Enter the RDS FQDN obtained from the AWS management console.
  • Port : This field specifies the connection port of MySQL Server. 3306 Specifies which is the default connection port for MySQL.
  • User : This field specifies the user when connecting to MySQL Server. Here, admin specify the RDS administrator account.*2
  • password : This field specifies the password for connecting to MySQL Server. Specify the password obtained when creating an RDS instance on the AWS management console.
  • Connection Method : A field to select the connection method when connecting from Fivetran to AWS RDS. Please refer to the official manual for details . Connect directly (TLS required)In this article, we choose for ease of configuration .
  • Update Method :A field that specifies how Fivetran will detect changes to the data on RDS after the initial Fivetran import is complete. Please refer tothe official manual for details. Here, select Detect Change via Fivetran Teleport Sync

5-2. TLS certificate trust

After entering all setting values, Save & Test click to move to Fivetran ⇒ Connectivity test screen to RDS. At this time, RDS fails to verify the trust chain of the TLS certificate used, and you will be asked if you want to manually trust it as follows, so trust it.

※We plan to investigate separately the cause of failure in the trust chain verification of the TLS certificate provided by RDS.

Once the manual trust is complete, the connectivity check will be retried and finally All connection tests passed! displayed. Click Continue.

5-3. Confirmation of Initial Sync and Ported Data

Start initial Sync Click here to go to the connector management screen.

Confirm that the initial import of data has started.

When the initial import is complete, the display will change to something like the one below.

If you check the Databricks Data Explorer screen, you can confirm that the specified schema and table have been created.

That's all for this article. How was it? As in the previous post (Google Sheets), I hope that you will be able to easily create ELT settings using Fivetran.

We provide a wide range of support, from the introduction of a data analysis platform using Databricks to support for in-house production. If you are interested, please contact us.

www.ap-com.co.jp

We are also looking for people to work with us! We look forward to hearing from anyone who is interested in APC.

Translated by Johann

www.ap-com.co.jp

*1:Fivetran ⇒ Amazon RDS(mySQL) への接続方法には、複数の選択肢がありますが、今回は FivetranがRDSのExportしているSQlエンドポイントに直接接続する方式を選択します。この方式に際しては、MySQL Server(RDS)側にてTLSが有効化されている必要があります。

*2:Fivetranの使用アカウントには、最小の権限のみが付与されることがベストプラクティスであるため、管理者アカウントを使うのは望ましい設定ではありません。今回は手順の簡素化のために管理者アカウントを指定しましたが、商用用途で作成の際は、Fivetran専用のアカウントを作成し、必要な権限のみを付与してください。