owned this note
owned this note
Published
Linked with GitHub
| Column 1 | Column 2 | Column 3 |
| -------- | -------- | -------- |
| Text | Text | Text |
<table>
<thead>
<tr>
<th>**Data Pipeline Tool**</th>
<th>**Summary**</th>
<th>**Best For**</th>
<th>**Pricing**</th>
</tr>
</thead>
<tbody>
<tr>
<td>**Portable**</td>
<td>Portable is an ETL platform with 300+ connectors for easy deployment of data pipelines. The tool also provides new data connectors on request for no extra charge with free maintenance.</td>
<td>Portable is ideal for teams that, in addition to managing data pipelines, also want to extract insights from data.</td>
<td>
```
// Product
{
"name": "compare",
"featureSet": {
"id": "compare",
"features": [
{
"id": "47c9e99f-b21e-4116-9931-fb81d1fe829e",
"product": "compare",
"title": "feature 1",
"description": "this is feature 1",
"type_of": "boolean",
"string_value": "on"
},
{
"id": "4fe3c6b8-e204-466e-8060-fb283f01b36d",
"product": "compare",
"title": "Feature 2",
"description": "this is feature 2",
"type_of": "text",
"string_value": "this is a test"
}
],
"features_conf": null
}
}
```
</td>
</tr>
<tr>
<td>**Apache Airflow**</td>
<td>Apache Airflow is an open-source data pipeline tool for creating, scheduling and monitoring data workflow. You can extract data from different sources, transform it, and load it to destinations.</td>
<td>Suitable for both startups and enterprises looking to scale up and customize their business processes</td>
<td>Free</td>
</tr>
<tr>
<td>**Oracle Data Integrator**</td>
<td>Oracle Data Integrator is a data integration platform that fulfills all data integration requirements – supports high volume, is compatible with batch loads, is event-driven, and more.</td>
<td>A right fit for organizations and businesses to support Big Data with the Oracle ecosystem</td>
<td>Oracle Data Integrator charges you on the basis of the computer instance. It costs $0.7742 OCPU (Oracle CPU) per hour.</td>
</tr>
<tr>
<td>**AWS Glue**</td>
<td>AWS Glue is a serverless data pipeline tool that helps you discover, prepare, move, and integrate data from multiple sources.</td>
<td>Best suited for applications primarily involving ETL and when you want to run jobs on a serverless Apache Spark-based platform</td>
<td>Hourly rate based on the number of data processing units (DPUs) required to run your job</td>
</tr>
<tr>
<td>**Apache Kafka**</td>
<td>Apache Kafka from LinkedIn is an open-source distributed publish-subscribe messaging event streaming platform capable of delivering data feed to data pipelines in real-time.</td>
<td>Capable of handling tons of data, Apache Kafka is the right fit for organizations looking to scale up.</td>
<td>Free</td>
</tr>
<tr>
<td>**Kedro**</td>
<td>Kedro is an open-source Python framework that allows you to create data pipelines. The tool helps you automate and reproduce data pipelines to facilitate the easy completion of regular tasks.</td>
<td>Best fit for projects meant to be built by large teams that need to be maintained over a long time.</td>
<td>Free</td>
</tr>
<tr>
<td>**Joblib**</td>
<td>Joblib facilitates lightweight pipelining in Python. It provides functions that help you to dump and load data easily.</td>
<td>Suitable for a large amount of data to save time and computational cost.</td>
<td>Free</td>
</tr>
</tbody>
</table>