Welcome to new things

[Technical] [Electronic work] [Gadget] [Game] memo writing

Thoughts on using AWS Glue

I have tried AWS Glue before and here are my thoughts on it.

AWS Glue is made of Apache Spark, and it was interesting for me to touch Spark for the first time at that time, I was thinking that I would put together a usage guide for Glue when I start using it seriously.

However, there were circumstances where we wanted to use Google BigQuery as our data output destination, In the end, we decided to use Google Dataproc instead of AWS Glue, so we decided to just summarize our impressions of the AWS Glue trial.

What is AWS Glue?

AWS Glue is in the genre of ETL tools.

However, the functionality is quite limited, and the purpose of AWS Glue, to put it simply, is to

  • 'Tools to create AWS Redshift Spectrum and AWS Athena tables from S3 data files.'

The beauty of both Redshift Spectrum and Athena is that you can go directly to the S3 files, To do so, a table definition (data catalog) must be created in advance.

Moreover, the table definition is not just a schema, but also a data range, In the case of log data, etc., which is added daily, the table definition must be updated for the amount of data added each day.

AWS Glue is a tool that automates the process of creating and updating table definitions.

Background of Technology

Both Redshift Spectrum and Athena are based on Hadoop technology. And, The table definitions we call data catalogs in Glue are Apache Hive Tables, Glue is a tool to create Hive Tables from S3 files.

In addition to creating and updating table definitions through crawling, Glue also allows users to process data more granularly through programming using Apache Spark.

Differences from AWS EMR

When I think of Hadoop, Hive, and Spark, AWS EMR comes to mind.

In effect, AWS Glue is made in AWS EMR, and table definitions can also be created in AWS EMR's Hive Table.

Glue differs from EMR in that it is a pay-as-you-go model where you are only charged when a job is running, Glue differs significantly from EMR in that the cluster autoscales according to the weight of jobs processed.

In other words, Glue is more of a customized EMR for table definition work.

Sense of use

Gule has various GUI operations to lower the barrier, so I expected to be able to use it as if it were an ETL tool, but it was no good at all.

There are many situations where the Amazon Glue manual is not enough to solve the problem, and knowledge of EMR, Hive, and Spark is required, It was faster to study them first and then touch them.

I felt that the learning cost would be quite high if I tried to use it simply as an ETL tool without any prerequisite knowledge of these tools.

On the other hand, if you have knowledge of those prerequisites, you can easily understand them.

impressions

Frankly, I wondered if this is a service that existing EMR users would consider as an alternative to EMR as a pay-as-you-go, auto-scale EMR.

I like Spark and use it, but is it easy to use? Would I recommend it to everyone? I'm not sure.... Also, Hadoop and Spark have been in a downtrend for several years, and I don't see them becoming popular in the future....

And based on those technologies, AWS Glue, Redshift Spectrum, and Athena, which are exposed to users as specifications in a graphic manner, don't seem to be going out of style....

So, unfortunately, we concluded that "if you are a non-AWS EMR user, nah".

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com