AI & Big Data Programming

Using Delta Lake to build Data Lake

<img alt='Avatar photo' data-srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' height='150' width='150' decoding='async' data-src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' class='avatar avatar-150 photo lazyload' src='data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==' /><noscript><img alt='Avatar photo' data-srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' height='150' width='150' decoding='async' data-src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' class='avatar avatar-150 photo lazyload' src='data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==' /><noscript><img alt='Avatar photo' data-srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' height='150' width='150' decoding='async' data-src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' class='avatar avatar-150 photo lazyload' src='data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==' /><noscript><img alt='Avatar photo' src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' class='avatar avatar-150 photo' height='150' width='150' decoding='async'/>

By Jack Hui Jan 30, 2023 delta lake, spark

Delta Lake is an open-source storage layer that can be used to build a data lake. Here is an example of how you can use Delta Lake to build a data lake:

Start a Spark cluster: To use Delta Lake, you will need to start a Spark cluster. You can do this by downloading the latest version of Spark from the Apache website and following the instructions to install it on your system.
Create a Delta Lake table: In order to store data in a Delta Lake table, you will need to create a new table. You can do this by using the Spark SQL API, for example:

from delta.tables import *
deltaTable = DeltaTable.create(spark, "path/to/data", "id INT, name STRING")

Load data into the table: After creating the table, you can load data into it. This can be done using the Spark Dataframe API, for example:

data = spark.read.format("csv").options(header="true", inferSchema="true").load("path/to/data.csv")
data.write.format("delta").mode("overwrite").save("path/to/data")

Perform data operations: With data loaded into the table, you can perform data operations such as filtering, aggregation, and join.

data_filtered = spark.read.format("delta").load("path/to/data").filter("id > 5")
data_aggregated = spark.read.format("delta").load("path/to/data").groupBy("id").agg({"name":"count"})

Optimize performance: Delta lake provides a feature called “Optimized table” that allows the user to optimize the performance of the table. This can be done using the command line tool “delta”, for example:

delta optimize path/to/data

Monitor and maintain: Finally, it is important to monitor the performance of the Delta Lake data lake and make sure that it is running smoothly. This can be done using the Delta Lake’s built-in web UI and various monitoring tool like Grafana.

<img alt='Avatar photo' data-srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' height='150' width='150' decoding='async' data-src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' class='avatar avatar-150 photo lazyload' src='data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==' /><noscript><img alt='Avatar photo' data-srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' height='150' width='150' decoding='async' data-src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' class='avatar avatar-150 photo lazyload' src='data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==' /><noscript><img alt='Avatar photo' data-srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' height='150' width='150' decoding='async' data-src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' class='avatar avatar-150 photo lazyload' src='data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==' /><noscript><img alt='Avatar photo' src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' class='avatar avatar-150 photo' height='150' width='150' decoding='async'/>

By Jack Hui

AI & Big Data

How to use Scikit for Deep Learning

<img alt='Avatar photo' data-srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' height='150' width='150' decoding='async' data-src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' class='avatar avatar-150 photo lazyload' src='data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==' /><noscript><img alt='Avatar photo' data-srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' height='150' width='150' decoding='async' data-src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' class='avatar avatar-150 photo lazyload' src='data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==' /><noscript><img alt='Avatar photo' data-srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' height='150' width='150' decoding='async' data-src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' class='avatar avatar-150 photo lazyload' src='data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==' /><noscript><img alt='Avatar photo' src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' class='avatar avatar-150 photo' height='150' width='150' decoding='async'/>

Jack Hui Feb 9, 2023

AI & Big Data Architecture

Setup Hadoop and Spark Cluster

<img alt='Avatar photo' data-srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' height='150' width='150' decoding='async' data-src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' class='avatar avatar-150 photo lazyload' src='data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==' /><noscript><img alt='Avatar photo' data-srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' height='150' width='150' decoding='async' data-src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' class='avatar avatar-150 photo lazyload' src='data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==' /><noscript><img alt='Avatar photo' data-srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' height='150' width='150' decoding='async' data-src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' class='avatar avatar-150 photo lazyload' src='data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==' /><noscript><img alt='Avatar photo' src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' class='avatar avatar-150 photo' height='150' width='150' decoding='async'/>

Jack Hui Feb 4, 2023

Architecture Programming

Empowering MongoDB Storage with GridFS, Version Control, and FUSE for Binary Data Management

<img alt='Avatar photo' data-srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' height='150' width='150' decoding='async' data-src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' class='avatar avatar-150 photo lazyload' src='data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==' /><noscript><img alt='Avatar photo' data-srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' height='150' width='150' decoding='async' data-src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' class='avatar avatar-150 photo lazyload' src='data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==' /><noscript><img alt='Avatar photo' data-srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' height='150' width='150' decoding='async' data-src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' class='avatar avatar-150 photo lazyload' src='data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==' /><noscript><img alt='Avatar photo' src='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-150x150.png' srcset='https://www.iamchaichai.com/wp-content/uploads/2023/01/cropped-iamchaichai_king_kong_front_face_neon_Light_cap_gun_fantastic_f_3c8abe55-f5b5-41ee-8bc0-e213a55d48eb-300x300.png 2x' class='avatar avatar-150 photo' height='150' width='150' decoding='async'/>

Jack Hui Feb 1, 2023

Leave a Reply Cancel reply

How to use Scikit for Deep Learning

AI & Big Data Architecture

Setup Hadoop and Spark Cluster

The Pros and Cons of Clickhouse

Install SMB CSI driver master version on a Kubernetes cluster