## A simple spark program for you to try
1. Edge nodes / S3 bucket - daily files are coming
2. File format - XYZ_20221021.zip
3. Inside Zip - you have employee.csv file with columns - first name, last name, id, salary, dept_id
### Transformations needed
- create a new name column - name = first name + last name
- calculate salary ranking for each employee within dept_id and add a new column - salary ranking. E.g., - for IT or Admin dept if employee salary is at the highest level or 2nd hihest etc.
- Ingest final data to Hive table/ HDFS or S3 bucket - partitioned date from file name itself "20221021"
```
### put your code here
```
path='../employee.csv'
df = pd.read_csv(path)
df['name'] = df.apply(lambda x: x['first_name'] + x['last_name'])
df_grps = df.groupby(['dept_id'])
for indx,group in df_grps:
group.sort_values(key='salary')
ranks = list(range(1, len(group) + 1))
group['ranks'] = pd.Series(ranks)