# Bankaya Test
**Resume** - This document present a test from bankaya and developed by Armando to show the skills as Data Analyst.
* Position - Data Analyst
* Candidate - Armando Carrillo
* Contact Info - carjdex9167@gmail.com
## Task 1 - Diagram Tables
The following digram will describe the tables and the relationship:
* The table `customer_profile` its the place where we can find customer information like the name, alias, when was created if there is active user or not, etc.
* The table `loans` its the place where you can find information about the credit requested as if is active, inactive, the amount, months where should be pay it, when was requested, if the credit was already paid.
* The table `payments` its the transactional table where we can find each payment the user does.
1. The relationship for `customer_profile` should be one to many. One user can has one or more credits
2. The relationship for `loans` should be one to many. One credit need be related a one user with one or many loans.
3. The relation for `payment` should be many to one. A lot payments need be related for one loan.
<iframe width="768" height="432" src="https://miro.com/app/embed/o9J_l4ydL1U=/?pres=1&frameId=3074457367327821995" frameBorder="0" scrolling="no" allowFullScreen></iframe>
## Task 2 - SQL Statements and table examples.
In the task we will find the DDL to create the table and simple `SELECT` for each table and show the first 5 rows. The profile table contains information about the user, `WHO`, each user that contain this table could has or not a `loan_id`.
DDL - Table Profile
SQL statement to create the table with the basic commands.
```sql
CREATE TABLE customer_profile (user_id uuid, first_name varchar(255), last_name varchar(255), surname varchart(255), birthdate datetime, address varchar(255), phone_number number)
```
This will be the description table (Table - customer_profile):
| customer_profile | type |
|------------------|----------|
| user_id | uuid |
| first_name | string |
| last_name | string |
| surname | string |
| birthdate | datetime |
| address | string |
| zip_code | number |
| phone_number | number |
This will show the first 5 rows
```sql
SELECT * FROM customer_profile LIMIT 5;
```
| user_id | firtst_name | last_name | surname | birthdate | address | phone_number |
|--------------------------------------|-------------|-----------|---------|------------|-------------|--------------|
| a9f1c6f2-b738-443b-9b40-0eeceb5ad72e | Adriana | Carrillo | adri | 1991-01-15 | puebla 123 | 123456 |
| e4ccc448-80c8-4d40-9568-2582a72c51eb | Armando | Gonzales | armand | 1993-03-18 | jalisco 432 | 456789 |
| 93701a77-dfb3-4bac-b86a-3c55153154ee | Ivan | Martinez | ivan | 1996-06-21 | colima 234 | 789456 |
| df85658d-820c-4b1c-94df-cbcfc1a5a2a1 | Marisol | Ramirez | mari | 1999-09-24 | oaxaca 917 | 456123 |
| 2421ac72-4420-46a9-a9b1-bb5af7313831 | Ana | Gutierrez | ana | 2002-12-27 | merida 183 | 789123 |
### Loan Table
This table contain information of the loans; the loans table has the information about each loan (like the configuration, who, how, when, what) about the credit. Each loan should has a user assigned `user_id`. Each loan could have or not a `payment_id`.
DDL Table
```sql
CREATE TABLE loan (loan_id uuid, user_id uuid, amount float, created_at datetime, ended_at datetime, week_assigned number)
```
Table - Loan
| loan_id | uuid |
|---------------|----------|
| user_id | uuid |
| amount | float |
| created_at | datetime |
| ended_at | datetime |
| status_code | number |
| status_msg | string |
| week_assigned | number |
Query to execute the first table rows
```sql
SELECT * FROM loans LIMIT 5;
```
| loan_id | user_id | amount | created_at | ended_at | week_assigned | status_code | status_msg |
|--------------------------------------|--------------------------------------|-----------|------------|------------|---------------|-------------|---------------|
| 4fe6535f-5320-4d6d-8206-32cf4fb3276b | a9f1c6f2-b738-443b-9b40-0eeceb5ad72e | 16839.368 | 2018-02-03 | 2020-07-22 | 8 | 0 | in progress |
| c9dd5764-d29e-44ce-8553-6e6765057f2f | e4ccc448-80c8-4d40-9568-2582a72c51eb | 19325.3 | 2018-02-04 | 2020-07-23 | 12 | 1 | paid |
| f94df6e0-f831-4ca5-97a9-aac896067d1a | 93701a77-dfb3-4bac-b86a-3c55153154ee | 56876.52 | 2018-02-05 | 2020-07-24 | 16 | 4 | error |
| 2d4ab16c-b5b3-43f1-9590-0dce51ec1c73 | df85658d-820c-4b1c-94df-cbcfc1a5a2a1 | 43000 | 2018-02-06 | 2020-07-25 | 20 | 3 | not completed |
| 11edad98-9208-4c32-849a-013911330562 | 2421ac72-4420-46a9-a9b1-bb5af7313831 | 155698.05 | 2018-02-07 | 2020-07-26 | 24 | 1 | paid |
### Payment Table
This table contain information about the payments; this a transactional table, we we are getting the payments events wich mean when a payment it happend and we are saving when happend, what happend, who did it, was approved? or not?, etc. Each payment should has a `loan_id`.
DDL - Payment Table
```sql
CREATE TABLE payment(payment_id uuid, loan_id uuid, status_code number, status_msg varchar(255))
```
Payment Description table;
| payment_id | uuid |
|-------------| --------|
| loan_id | uuid |
| status_code | number |
| status_msg | string |
| created_at |datetime |
```sql
SELECT * FROM payment LIMIT 5;
```
| payment_id | loan_id | status_code | status_msg |
|--------------------------------------|--------------------------------------|-------------|---------------|
| 7a92892f-04c3-43b7-abf6-694e28ab47d8 | 4fe6535f-5320-4d6d-8206-32cf4fb3276b | 0 | in progress |
| be0eae7b-3343-425b-9859-daf1e51f396e | c9dd5764-d29e-44ce-8553-6e6765057f2f | 1 | paid |
| b166967e-b464-45a9-8a98-44c953f496f0 | f94df6e0-f831-4ca5-97a9-aac896067d1a | 4 | error |
| 8925ad58-5337-4af0-a4f0-34b0093d3f3a | 2d4ab16c-b5b3-43f1-9590-0dce51ec1c73 | 3 | not completed |
| dbe137da-88ed-46e2-8642-3af6100ce930 | 11edad98-9208-4c32-849a-013911330562 | 1 | paid |
## Task 3
Here we will find the business question and how will be answered.
### Task 3.1
Select all customers that have multiple active loans:
```sql=
SELECT
user_id -- this means the user
,COUNT(loan_id) as Qty_loans -- each id means a loan, just lets be use count to know the qty
FROM loans
WHERE status_code = 0 -- flag to know the active loans
GROUP BY user_id
HAVING Qty_loans >= 2 -- we want only all the users had more than 2 loans
ORDER BY Qty_loans DESC;
```
Note: I am not using`customer_profile` to gell all the users because the sentense say `ACTIVE` loans. Is not necessary take the customers table for all the users has't `ACTIVE` loans or even the customer don't have a loan.
### Task 3.2
Select all customers that have made two consecutive payments any of their active loans:
```sql=
WITH tbl_active_loans AS (
SELECT
l.user_id -- With the id we can get custmer info
,l.loan_id -- Identify the loan or loans as active
,p.payment_id -- identify the payments
,p.created_at -- This column we will use it as flag eye to see the secuence
,row_number() over (partition by l.user_id,l.loan_id order by p.created_at DESC) as row_number -- This will order the sequence payments, will take as 1 the highest date.
FROM loans l
INNER JOIN payment p ON l.loan_id = p.loan_id
WHERE l.status_code = 0 -- Get the Active loans
)
-- The sub query its to be more clean for reading and performance.
SELECT *
FROM tbl_active_loans
WHERE row_number > 1 -- We want to know more of one payment
ORDER BY created_at DESC
```
Note: the sentense is not requering a quantificable information, thats why I developed a query to show only the information. the key to see the consecutive payments should be by the event `created_at`
### Task 3.3
Select the total amount paid and the average number of payments made by customers that have only one active loan
```sql=
WITH tbl_active_loans AS (
SELECT
DISTINCT loan_id
FROM loans
WHERE status_code = 0 -- Flag to get the active loans
)
SELECT
SUM(p.amount) AS TPV -- Total Payment Volume
,AVG(p.payment_id) AS Avg_payments
FROM payment p
WHERE loan_id IN (SELECT laon_id FROM tbl_active_loans)
```
Note: I can use a LEFT JOIN but why not use temp table?
### Task 3.4
Select the total amount paid grouped by zip code and age buckets
```sql=
SELECT
SUM(p.amount) AS total_credit
,a.zip_code
FROM customer_profile AS a
LEFT JOIN loans l ON a.user_id = l.user_id
LEFT JOIN payment p ON l.loan_id = p.loan_id
GROUP BY a.zip_code;
```
Note: I don't understand what age buckets means.
### Task 3.5
What is the difference between LEFT JOIN and INNER JOIN. Please explain shortly and present a code example using your defined tables.
INNER JOIN: Will join two tables when the column key exists in both sides. If one table doesn't contain the same exact record. the `INNER` will not take it.
LEFT JOIN: Will join two tables and it will show the records even if the row dones't exist on the table `B` will show as `null`
Lets use `customer_profile` and `loans` tables as example. If we use Inner join should be like
```sql=
SELECT a.*
FROM customer_profile a
INNER JOIN loans l ON a.user_id = l.user_id
LIMIT 10; -- Always take care the performance, lets use a limit when we know ifo table, or we don't know the qty of rows.
```
This query will get anly the customers request for a credit.
The customer profile its only recording information about the user and loans has information about the credits, a user could exist but could have or not a loan.
Now, lets use LEFT JOIN:
```sql=
SELECT a.*
FROM customer_profile a
LEFT JOIN loans l ON a.user_id = l.user_id
LIMIT 10; -- Always take care the performance, lets use a limit when we know ifo table, or we don't know the qty of rows.
```
This query will show all the customers and if they had loans will shown or will show the customer as a null, which means the customer still not request a credit/loan.
### Task 3.6
Select your Payments table and add the column "payment_number". Find the expected output in the following chart:
```sql=
ALTER TABLE payment ADD COLUMN payment_number number AS (NUMBER(38,0));
```