![]() ![]() ![]() Please note that source DynamicFrame won't contain partition columns year and month so you might want to add them manually. How to enable Snappy compression on Avro-backed Hive table when STOREing from Pig using HCatWriter I have a Hive table I created like this: set set CREATE EXTERNAL TABLE mytable ( aaa STRING. Val source = glueContext.getSourceWithFormat( For example, if you want to load data for year=2018 and (month=03 or month=04) then your code should look like this: val paths = Array( For getSource () and getSourceWithFormat () there is no possibility to pass. In getCatalogSource () it loads this information from Glue Catalog and passes to validator. For getSource() and getSourceWithFormat() there is no possibility to pass custom list of data partitions to be used for validation and therefore it's not possible to use pushdown predicate.Īs a workaround you can generate paths which include data partitions and pass it in getSourceWithFormat() through options. Currently its not possible to use pushdown predicate in getSource () and getSourceWithFormat () since internally it validates if fields in expression are actually partitions. In getCatalogSource() it loads this information from Glue Catalog and passes to validator. How do I read hive-partitioned Avro serialised data in Glue without having a crawler?Ĭurrently it's not possible to use pushdown predicate in getSource() and getSourceWithFormat() since internally it validates if fields in expression are actually partitions. Once the Avro table is created, to enable snappy compression following properties needs to be set under Environment SQL of Hive connection: set set -Starting Hive version 0. I have also tried to add "partitionKeys": Val src = glueContext.getSourceWithFormat(connectionType = "s3", format="avro", options = opts, pushDownPredicate = predicate)īut it wouldn't accept a pushdown predicate at all: error: unknown parameter name: pushDownPredicate I am trying to access this data in Glue as: val predicate = "year=2018 and month=03" I have a bucket which has lots of data in Avro partitioned in a "hive" style, for example s3://my-bucket/year=2018/month=03/day=25/file-name.avro ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |