Try โ€‚โ€‰HackMD
tags: Ozone

Ozone with pyarrow

HDDS-2443

  • Path to try: pyarrow which use the Java client under the hood.

How to work

  • pyarrow uses _hadoop_classpath_glob to get all Hadoop jar and required libraries.

  • If we want to make pyarrow support o3fs, we should let hadoop-ozone-filesystem-0.5.0-SNAPSHOT.jar and hadoop-ozone-filesystem-lib-current-0.5.0-SNAPSHOT.jar be get by _hadoop_classpath_glob.

  • We can use export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$OZONE_HOME/share/ozone/lib/hadoop-ozone-filesystem-lib-current-$OZONE_VERSION.jar

What's going

  • version

  • setting

    • core-site.xml of Ozone
    โ€‹โ€‹โ€‹โ€‹<configuration> โ€‹โ€‹โ€‹โ€‹ <property> โ€‹โ€‹โ€‹โ€‹ <name>fs.defaultFS</name> โ€‹โ€‹โ€‹โ€‹ <value>o3fs://buc.vol</value> โ€‹โ€‹โ€‹โ€‹ </property> โ€‹โ€‹โ€‹โ€‹</configuration>
  • python code

    โ€‹โ€‹import pyarrow as pa
    โ€‹โ€‹import pyarrow.parquet as pq
    โ€‹โ€‹
    โ€‹โ€‹fs = pa.hdfs.connect(host='o3fs://buc.vol.{ip_of_Ozone}', port={your_port}, user={username})
    โ€‹โ€‹rootpath='/'
    โ€‹โ€‹print(fs.ls(rootpath))
    โ€‹โ€‹
    โ€‹โ€‹catpath='/key'
    โ€‹โ€‹print(fs.cat(catpath))
    
  • execute

    • python3.x python_code.py