AWS Help..

JJJCR

from LinuxQuestions.org on 2020-04-24 13:49 (#52KXV)

Hi guys, anyone who is familiar with AWS, need your input on creating glue partitions.

I have a JSON data on an S3 bucket.

It's being picked up by the crawler and a table is being created.

The issue that I have, the table being created is being named after the parent bucket name. And any new data added is being created as part of this table.

Desired output is to have a separate table per bucket.

Example:

BucketBaseData-->TestData1-->Region1-->Results1-->....

TestData1 (sub folder bucket right after the parent bucket)
Partition as define on the table:
Region1 --> Partition0
Results1 --> Partition1
.... other columns follow

Output, after running the crawler:
Table Name is: BucketBaseData

Table Schema:
Partition_0 Region1 Results1
TestData1 Region1 Results1

Basically, Partition_0 is being added automatically by AWS Glue Crawler.

I don't know how to name the table as "TestData1" and partition as defined on my table.

Desired Output:
Table Name: TestData1 (desired to have separate table for any subfolders right after the parent bucket)

Table Schema:
Partition0 = Region1
Partition1 = Results1
...followed by other columns

So, if I have this another sub folder BucketBaseData Bucket:

BucketBaseData-->TestData1-->Region1a-->Results1a-->....

There would be another table called: Region1a

This link: https://docs.aws.amazon.com/glue/lat...artitions.html

Has this article, below.

For Apache Hive-style partitioned paths in key=val style, crawlers automatically populate the column name using the key name.

Otherwise, it uses default names like partition_0, partition_1, and so on.

To change the default names on the console, navigate to the table, choose Edit Schema, and modify the names of the partition columns there.

--- Is there a way to override the default partition path? If yes, how to do it?
So, I can have separate tables for each sub-folder bucket.

Hope I made my explanation clear.
Thank you for any input. Cheers!

latest?i=4M2qbih6X20:_bzAAOUa09Q:F7zBnMy

latest?i=4M2qbih6X20:_bzAAOUa09Q:V_sGLiP

latest?i=4M2qbih6X20:_bzAAOUa09Q:gIN9vFw

Source	RSS or Atom Feed
Feed Location	https://feeds.feedburner.com/linuxquestions/latest
Feed Title	LinuxQuestions.org
Feed Link	https://www.linuxquestions.org/questions/