AWS Help..
by JJJCR from LinuxQuestions.org on (#52KXV)
Hi guys, anyone who is familiar with AWS, need your input on creating glue partitions.
I have a JSON data on an S3 bucket.
It's being picked up by the crawler and a table is being created.
The issue that I have, the table being created is being named after the parent bucket name. And any new data added is being created as part of this table.
Desired output is to have a separate table per bucket.
Example:
BucketBaseData-->TestData1-->Region1-->Results1-->....
TestData1 (sub folder bucket right after the parent bucket)
Partition as define on the table:
Region1 --> Partition0
Results1 --> Partition1
.... other columns follow
Output, after running the crawler:
Table Name is: BucketBaseData
Table Schema:
Partition_0 Region1 Results1
TestData1 Region1 Results1
Basically, Partition_0 is being added automatically by AWS Glue Crawler.
I don't know how to name the table as "TestData1" and partition as defined on my table.
Desired Output:
Table Name: TestData1 (desired to have separate table for any subfolders right after the parent bucket)
Table Schema:
Partition0 = Region1
Partition1 = Results1
...followed by other columns
So, if I have this another sub folder BucketBaseData Bucket:
BucketBaseData-->TestData1-->Region1a-->Results1a-->....
There would be another table called: Region1a
This link: https://docs.aws.amazon.com/glue/lat...artitions.html
Has this article, below.
For Apache Hive-style partitioned paths in key=val style, crawlers automatically populate the column name using the key name.
Otherwise, it uses default names like partition_0, partition_1, and so on.
To change the default names on the console, navigate to the table, choose Edit Schema, and modify the names of the partition columns there.
--- Is there a way to override the default partition path? If yes, how to do it?
So, I can have separate tables for each sub-folder bucket.
Hope I made my explanation clear.
Thank you for any input. Cheers!


I have a JSON data on an S3 bucket.
It's being picked up by the crawler and a table is being created.
The issue that I have, the table being created is being named after the parent bucket name. And any new data added is being created as part of this table.
Desired output is to have a separate table per bucket.
Example:
BucketBaseData-->TestData1-->Region1-->Results1-->....
TestData1 (sub folder bucket right after the parent bucket)
Partition as define on the table:
Region1 --> Partition0
Results1 --> Partition1
.... other columns follow
Output, after running the crawler:
Table Name is: BucketBaseData
Table Schema:
Partition_0 Region1 Results1
TestData1 Region1 Results1
Basically, Partition_0 is being added automatically by AWS Glue Crawler.
I don't know how to name the table as "TestData1" and partition as defined on my table.
Desired Output:
Table Name: TestData1 (desired to have separate table for any subfolders right after the parent bucket)
Table Schema:
Partition0 = Region1
Partition1 = Results1
...followed by other columns
So, if I have this another sub folder BucketBaseData Bucket:
BucketBaseData-->TestData1-->Region1a-->Results1a-->....
There would be another table called: Region1a
This link: https://docs.aws.amazon.com/glue/lat...artitions.html
Has this article, below.
For Apache Hive-style partitioned paths in key=val style, crawlers automatically populate the column name using the key name.
Otherwise, it uses default names like partition_0, partition_1, and so on.
To change the default names on the console, navigate to the table, choose Edit Schema, and modify the names of the partition columns there.
--- Is there a way to override the default partition path? If yes, how to do it?
So, I can have separate tables for each sub-folder bucket.
Hope I made my explanation clear.
Thank you for any input. Cheers!