###### tags: `Ozone` Ozone with [pyarrow](https://pypi.org/project/pyarrow/) === ### [HDDS-2443](https://issues.apache.org/jira/browse/HDDS-2443) - Path to try: ```pyarrow``` which use the Java client under the hood. ### How to work - ```pyarrow``` uses [```_hadoop_classpath_glob```](https://github.com/apache/arrow/blob/master/python/pyarrow/hdfs.py#L161#L165) to get all Hadoop jar and required libraries. - If we want to make ```pyarrow``` support ```o3fs```, we should let ==```hadoop-ozone-filesystem-0.5.0-SNAPSHOT.jar```== and ==```hadoop-ozone-filesystem-lib-current-0.5.0-SNAPSHOT.jar```== be get by ```_hadoop_classpath_glob```. - We can use ```export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$OZONE_HOME/share/ozone/lib/hadoop-ozone-filesystem-lib-current-$OZONE_VERSION.jar``` ### What's going - ### version - ```pyarrow``` version: [0.15.1](https://pypi.org/project/pyarrow/) - ```Hadoop``` version: [3.2.1](https://hadoop.apache.org/docs/stable/) - ```Ozone``` version: [master branch](https://github.com/apache/hadoop-ozone) - ### setting - ```core-site.xml``` of ```Ozone``` ```= <configuration> <property> <name>fs.defaultFS</name> <value>o3fs://buc.vol</value> </property> </configuration> ``` - ### python code ```=python import pyarrow as pa import pyarrow.parquet as pq fs = pa.hdfs.connect(host='o3fs://buc.vol.{ip_of_Ozone}', port={your_port}, user={username}) rootpath='/' print(fs.ls(rootpath)) catpath='/key' print(fs.cat(catpath)) ``` - ### execute - ```python3.x python_code.py```