I passed the Databricks Certified Data Engineer Associate (Version 3)!
This is Abe from the Lakehouse Department of the GLB Division. I usually talk about verification around Databricks, but this time I would like to talk about my study and attitude until I pass the above exam. First of all, based on my self-introduction, I looked back on the events leading up to the exam.
- Joined APC in January 2023
- Assigned to the Lakehouse department in late January Started learning about Databrick using the materials of Databricks Academy, but struggled to acquire infrastructure-related knowledge without SQL queries and knowledge that differed depending on the RDBMS. → The textbook assumes 4 days, but it takes about 2 weeks.
- In early February, I bought a Udemy mock exam course, but I couldn't solve the problem at all, and I felt like I was going to break down and leave it alone.
- End of February Started studying for the qualification exam in earnest. Primarily learning from Udemy and Azure Databricks documentation.
- March Posts an article on Databricks testing on the technical blog (first post for the Lakehouse department). Completed the Udemy course and finished the second week of mock exams. Participate in Databricks projects.
- April Resolved Udemy's mock exams and completed the third round. If you repeat mock exams, you will be able to understand the answers of the choices and feel like you are over-learning, so look carefully beyond the answer choices and review the documents if you are unsure.
- April 23 Passed the exam
I wrote it at length, but it took me 3 months to start using Databricks and 2 months to pass the exam.
- Databricks Certified Data Engineer Associate by exam scope
- Specific study method
- Parts I struggled
- What I'm glad I studied after finishing the exam
- Reference article
Databricks Certified Data Engineer Associate by exam scope
To pass the exam, it is important to know your opponent first. I will check the outline of this exam from the official Databricks page, but in a nutshell, it will test whether you can solve relatively simple problems using each component of Databricks Lakehouse Platform, SQL, and Python according to the use case. I believe that
Below is an excerpt from the homepage about the range of questions and the rate of questions, and I summarized it in my own way.
Databricks Lakehouse Platform – 24% (11/45) about Databricks Lakehouse Platform components and architecture, Delta Lake, and table operations
ELT with Spark SQL and Python – 29% (13/45) of ELT on databases, tables and views using SQL and Python
Incremental Data Processing – 22% (10/45) Batch and Streaming Processing with Auto Loader, Different Layers in Medallion Architecture, Delta Live Table
Production Pipelines – 16% (7/45) about building jobs, setting up Databricks SQL dashboards
Data Governance – 9% (4/45) Unity Catalog, on granting permissions to tables and views for teams and users
Looking at the number of questions for each unit, you can see that ELT with spark SQL and Python is the most asked. Therefore, it is necessary to suppress the basic syntax of Python and SQL
Specific study method
Before talking about specific study methods, I recommend that you start by touching the Databricks environment. Databricks holds occasional intensive hands-on training sessions that will help you understand the suite of features.
Also, after creating a Databricks workspace on your preferred cloud provider, we recommend studying the intensive training code on github.
1. Take a practice test course on Udemy. 。Databricks Certified Data Engineer Associate Practice Exams
2. Refer to Azure Databricks documentation。Azure Databricks document
3. Upload verification results to the technical blog. (Recommended) Verification Blog
First, while taking the Udemy mock exam, I repeatedly looked up words and codes I didn't know in the Azure Databricks document. After That, the official documentation is the most organized, so I deepened my understanding by using the document like a dictionary and going back and forth with Udemy. I knew that even if I looked it up once and understood it, I would forget it after a while, so I put together my study content in my direct message on slack, and specifically, I put the URL of the document and the results of my research in a summary. I searched and looked back when I solved a similar problem and did not understand. The third study method, uploading to a technical blog, seems to be a bit of a hurdle, but I felt that my understanding of Databricks was completely different. I also posted an article about working with Delta tables, but I think I was able to solve a lot of questions about Delta Lake and working with tables. On the other hand, Delta Live Table and Databricks SQL have not yet been fully verified, so the percentage of correct answers decreased. If you are aiming to pass the exam, let's upload a technical blog together.
By the way, there is also an official [mock test(https://files.training.databricks.com/assessments/practice-exams/PracticeExam-DataEngineerAssociate.pdf), so it would be a good idea to take it before the exam.
Parts I struggled
The hardest part was understanding the components. At first, I had to memorize all of them because there was a lot to memorize, but I actually touched the Databricks environment to see what settings were in what places. Also, since there are SQL queries specific to Databricks, I tried to understand by actually running them.
What I'm glad I studied after finishing the exam
Since I was able to understand the basics of Databricks, it was useful for understanding the story at the project site and sending out technical blogs. Also, since this is an exam where answers are given according to use cases, I feel that I was able to understand which components to use according to use cases.
In this article, I wrote about studying to pass the Databricks Certified Data Engineer Associate (Version 3). Because I was able to understand the basics of the Databricks Lakehouse Platform through exam preparation, I would like to make use of it in my technical blogs and projects. We hope you will visit us again as we will be posting more Databricks-related articles in the future. We provide a wide range of support, from the introduction of a data analysis platform using Databricks to support for in-house production. If you are interested, please contact us.
We are also looking for people to work with us! We look forward to hearing from anyone who is interested in APC.
Translated by Johann