The dark side of the classloader magic
===
Let's say you have two classloaders in Java:
1. Classloader #1 which loads all the `java.*` and `javax.*` classes
2. Classloader #2 which loads all the `org.apache.hadoop.*` classes
Classloader #1 is the __parent__ classloader of Classloader #2.
When you use classloader #2 to load a class (class X):
1. If class is available from #1, it will be loaded from there (_"parent-first"_!)
2. If not, the classloader #2 will try to load it
## Filtered Classloader
To support hadoop 2 with Ozone (which depends on Hadoop 3) we introduced a new classloader hierarchy:
1. Classloader #1 loads all the `java.*` and `java.*` classes
2. Classloader #2 loads the classes of the yarn/mapreduce application (hadoop2 classes)
3. Classloader #3 loads the classes of the ozone (hadoop3)
Classloader #3 is isloated: it loads all the Ozone + Hadoop 3 classes from a very secret place.
Classloader #2 can't see any hadoop3 classes (because the parent of #2 is #1), which is fine, because it's used by Yarn/mapreduce application we wouldn't like to poison this classpath with hadoop3 classes.
Classloader #2 can see (see = load) only the `OzoneFileSystem` and`OzoneClientAdapter` (the later one is used by OzoneFileSystem).
To support Hadoop 2 the `OzoneClientAdapterImpl` is created with the help of Classloader #3 and all of the other ozone/hadoop3 classes which are required by the adapter are loaded by #3.
But still there is a problem: All of the classes which are visible for Classloader #2 (the mapreduce application) should be loaded by Classloader #2. For example `org.apache.hadoop.fs.Path`, `org.apache.hadoop.security.token.Token`.
And we should use only one `Path` object as `Path`.
As an example to this problem:
If `OzoneClientAdapterImpl` (loaded by #3) implements `OzoneClientAdapter` (loaded by #3), then it will be incompatible with the mapreduce application as `OzoneClientAdapter` (loaded by #3) is __NOT__ an instance of `OzoneClientAdapter` (loaded by #3)
We need an `OzoneClientAdapterImpl` (loaded by #3) which implements `OzoneClientAdapter` (loaded by #2). Which means that `OzoneClientAdapter` should be shared between the two classloaders (the #2 version should be used by #3 classes).
To achieve this, we modified the behaiour of the Classloader #3:
1. For some selected [classes](https://github.com/apache/hadoop/blob/91cc19722796877f134fd04f60229ac47a1bd6e0/hadoop-ozone/ozonefs/src/main/java/org/apache/hadoop/fs/ozone/FilteredClassLoader.java#L55) -- such as the ``OzoneClientAdapter` -- we use the parent-first approach (hadoop2 first).
2. For all the other ozone/hadoop3 classes we use children-first (loading from the isolated space)
## The problem
Here is my current problem:
```
java.lang.NoSuchMethodError: org.apache.hadoop.fs.FileStatus.attributes(ZZZZ)Ljava/util/Set;
at org.apache.hadoop.fs.protocolPB.PBHelper.convert(PBHelper.java:99)
at org.apache.hadoop.ozone.om.helpers.OzoneFileStatus.getFromProtobuf(OzoneFileStatus.java:63)
at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.getFileStatus(OzoneManagerProtocolClientSideTranslatorPB.java:1286)
at org.apache.hadoop.ozone.client.rpc.RpcClient.getOzoneFileStatus(RpcClient.java:962)
at org.apache.hadoop.ozone.client.OzoneBucket.getFileStatus(OzoneBucket.java:479)
```
`FileStatus` should be loaded by the Classloader #2, as it's returned by the `OzoneFileSystem` (#2) and may be used by the mapreduce application as any other hadoop2 class.
But we use `org.apache.hadoop.fs.protocolPB.PBHelper` (#3) to serialize/deserialize the `FileStatus` (#2) as you can see from the stack trace. Which is not compatible with the generic, shared, `FileStatus` (#2) from hadoop2.
Theoretically it can be solved by using hadoop2 `PBHelper` (#2) together with hadoop2 `FileStatus` (#2) But guess what? This `PBHelper` (#2) is hadoop 3 only.
## What is the solution
We knew that it will be a difficult path to use custom class loader, but we started to use it as we had no better idea. The same behaviour would require 3-4 different projects with shading (If possible).
But:
* We already have 3-4 projects with the classloader magic
* The current issue can be solved only with shading (if possible at all)
* Shading may help to detect the anomalies during compile time
Therefore I suggest to give it an other try to the shading, and without this, we can't support hadoop2.
### Shading
Shading means that we need two different `FileStatus`:
* `org.apache.hadoop.fs.FileStatus` from hadoop2 (#2)
* `org.apache.hadoop.ozone.shaded.org.apache.hadoop.fs.FileStatus` from hadoop3 (#3)
We need a conversion between the two instances (create a version from the #2 and copy all the attributes from #3).