---
tags: excel
title: Working with CSV format - Part 2
description: A simple tutorial on how to import CSV data in your spreadsheet and start using that
---
Picking up from where we'd left off in in part 1, we can start with planning for potential tax computations
# Planning for tax computation
At this point, we've computed units outside of LTCG zone (purchased more than a year ago, that's still in investor's portfolio).
We need to compute these:
- tax liability, if investor were to sell all these units.
- total number of units present, where net tax liability is equal to or below 1L, if investors were to sell those
---
### How to compute Tax?
Tax is `SUM(selling price of units - purchase price of units)`.
Say, an investor purchased
- `U1` units, on date `t1`, at price `P1`.
- `U2` units, on date `t2`, at price `P2`.
- `U3` units, on date `t3`, at price `P3`.
Assume selling price (NAV on day of selling) is `P`.
This investor has `(U1 + U2 + U3)` available for selling. Tax is:
```
[(U1 * (P - P1) + U2 * (P - P2) + U3 * (P - P3)]
```
Which can be written as:
```
(U1 + U2 + U3) * P - (U1 * P1 + U2 * P2 + U3 * P3)
```
In previous section, we've computed effective value of `(U1 + U2 + U3)`, for any number of transactions.
We know `P1`, `P2`, `P3`; for all those transactions.
Only missing information is sellig price, `P`.
What we need, is selling price as on today (the given day when investor is attempting the sell transaction).
In other words, we need a way to get access to latest NAV of these funds.
Because this NAV would change every day (equity mutual funds update NAVs at the end of the day, every business day); it'd be better to _programmatically_ access it, like we used `GOOGLEFINANCE()` for historic data.
`GOOGLEFINANCE()` can be one option here. But as we've learned in the previous section, it has issues (remember seeing `#NA` all over the place?)
Instead, we'd go to the source!
---
### Getting the latest NAV
Every AMC has a regulatory requirement to update their NAV daily with AMFI (**A**ssociation of **M**utual **F**unds in **I**ndia). And AMFI makes this data available for parsing, in a **CSV format**, on their website.
This is the link: https://www.amfiindia.com/spages/NAVAll.txt
Remember when you were thinking learning `SPLIT()` `TRANSPOSE()` would've been of no use? Well, as you're about to see, not all CSV data come in text files.
Google Sheets has an `IMPORTDATA()` function, that'd do just fine. We provide it a URL for the AMFI NAV endpoint, and it'd fetch that latest raw NAV data.
Then we can _split_ that raw NAV feed into cells, as we see fit.
---
### Finding your fund
This AMFI URL has ~20k entries. To be able to find the funds as in the CSV transaction dump, our excel matchers would be fine.
But the problem is this: the name of fund can appear differently in that list, as opposed to how it appears in the CSV.
Finding fund by matching strings would lead to misses.
A better option would be to find fund the way it's usually don - by a unique identifier.
This unique identifier is already present in the AMFI NAV feed URL: **`ISIN`**
ISIN (**I**nternational **S**ecurities **I**dentification **N**umbers) is unique across global securities market. Not only can there be no two funds with same ISIN in India, there cannot be in the world!
We have to find ISIN for each of the funds in CSV transaction, manually, and add a column in the second sheet, next to each fund, to add its ISIN.
List of ISIN is also present in AMFI NAV feed.
This is actually a straight-forward, one-time exercise. It should also make it clear to the reader why having too many funds to portfolio can be cumbersome :sweat_smile:
---
# Execution
### Adding ISIN column
Let's start by adding the ISIN column first
In the second sheet (not the one named `data`), add a column next to fund's name. You can use _Insert Column_.
Insert ISINs as follows (copy-paste from below):
| ISIN |
|--------------|
| INF879O01019 |
| INF205K01DN2 |
| INF846K01CH7 |
| INF769K01036 |
| INF769K01101 |
| INF179K01CR2 |
| INF090I01569 |
| INF769K01010 |
| INF090I01239 |
| INF090I01171 |
| INF090I01809 |
| INF109K01BL4 |
| INF740K01185 |
Final results, should look like this
| Fund Name | ISIN |
|------------------------------------------------------|--------------|
| Parag Parikh Flexi Cap Fund - Regular Plan - Growth | INF879O01019 |
| Invesco India Multicap Fund - Growth | INF205K01DN2 |
| Axis Focused 25 Fund - Regular Plan - Growth | INF846K01CH7 |
| Mirae Asset Large Cap Fund - Regular Plan - Dividend | INF769K01036 |
| Mirae Asset Emerging Bluechip Fund - Regular Growth | INF769K01101 |
| HDFC Mid-Cap Opportunities Fund - Growth | INF179K01CR2 |
| Franklin India Smaller Companies Fund - Growth | INF090I01569 |
| Mirae Asset Large Cap Fund - Regular Plan - Growth | INF769K01010 |
| Franklin India Flexi Cap Fund - Growth | INF090I01239 |
| Franklin India Bluechip Fund - Growth | INF090I01171 |
| Franklin India Prima Fund - Growth | INF090I01809 |
| ICICI Prudential Bluechip Fund - Growth | INF109K01BL4 |
| DSP Tax Saver Fund - Regular Plan - Growth | INF740K01185 |


---
### Fetch and Process Latest NAV from AMFI
Follow these steps
- Create a new sheet, name it _AMFI NAV (Raw)_
- In a cell, enter the AMFI URL: `https://www.amfiindia.com/spages/NAVAll.txt`
- Invoke `IMPORTDATA()` function, and pass the cell ID of the URL above, as argument
- Given it few seconds, it'd fetch latest NAV from AMFI for all mutual funds. It should add some 18k+ rows.
This data is in CSV format, separated by semicolon, per line.
- Create another new sheet, name it _AMFI NAV (Processed)_
- In this sheet, add somewhere this formula
`=IF(ISBLANK(<cell ID>), , SPLIT(<cell ID>, ";")`, where `<cell ID>` points to a row of semicolon-separated data, in the sheet _AMFI NAV (Raw)_.
`ISBLANK(<cell ID>)` checks if the cell is empty. As you can see from the data, there can be plenty of empty cells. Invoking `SPLIT()` on these, would result in an error, it'd show up as a `#VALUE` error on spread sheet.
`IF(condition, arg1, arg2)` checks if `condition` is true for a cell (in this case, if the cell is empty). If so, the first argument `arg1` in inserted in the result cell (not the cell on which condition is being evaluated). Otherwise, `arg2` is inserted.
`=IF(ISBLANK(<cell ID>), , SPLIT(<cell ID>, ";")` basically means _if `<cell Id>` is empty, don't put anything, else split the content of the cell by semicolon, and put the results in cells._
- Use autofill drag to fill out all 18k+ rows in new sheet.
Refer to the following video(s) for reference
{%youtube URcjv1P_dcA%}
{%youtube LGD_N6mBCXA%}
Your sheets would now have latest available NAV from all mutual funds, in a nice table format.
---
### Computation of Value as on Today
In this section, we'd compute net redemption value of units older than 1 year.
This is total available units older than one year (already computed earlier), multiplied by latest NAV.
We can look up latest NAV using `VLOOKUP` with ISIN as search query.
`VLOOKUP()` stands for _vertical lookup_ or row-wise lookup, across columns. It's a way of _searching_ for matching data.
There's also `HLOOKUP()` and `LOOKUP()`; but for our use case, `VLOOKUP()` would work fine for now.
Follow these steps:
- Come back to the main sheet, which has information on available units and sold units.
- Add a column at the end, titled _Latest Value_
- Invoke `VLOOKUP()` for the top-most row as follows:
```
VLOOKUP(
<cell ID that has ISIN for the fund>,
<range of data from processed NAV sheet>,
<column index, that has NAV for the fund>
false
)
```
First argument, `<cell ID that has ISIN for the fund>` is ISIN, but we're doing it this way, to be able to drag and auto-fill.
Second argument, `<range of data from processed NAV sheet>` is a bit interesting.
**`VLOOKUP()` does not work if the column we are searching for is not the first column of a range**. In our processed NAV sheet, the ISIN is the second column.
We have to pick a range, leaving out the first column.
This range is a combination of rows and cells. For instance, if your processed NAV sheet starts at `B3` with `Scheme Code` text in it; then, our range would be something like `C9:F18123` (leaving out the last, date column).
Since we'd be using drag & auto-fill, we can lock these cells: `$C$9:$F$18123`.
Third argument, `<column index, that has NAV for the fund>`, is colum index of the column for the result. We're looking for NAV when ISIN matches, which is the 4th column in our range (ISIN is the first column in this range). Hence, we can put this value as 4.
Final argument, is `false`. It tells `VLOOKUP()` that the dataset in the range is not sorted. Our data set is not sorted on the basis of ISIN (sorted alphabetically on AMC name, actually), hence `false` is the right value to pass here.
Refer to this video if you need help:
{%youtube WUavhFi8XLY%}
{%youtube QDdToO78UgQ%}
Final outcome in the newly added column, should match these screenshots:


---
### Adding up total cost of purchase
On surface, this sounds easier than previous steps.
All one has to do, is go and _sum_ on units and NAV column, where date is much before the date 1 year ago.
In fact, investment amount is provided in the CSV transaction dump, so multiplication isn't even needed (invested amount = unit x NAV on that date).
Except, it's not that simple.
We have realized that even if units were acquired more than one year ago, doesn't mean some of those weren't sold in last 1 year, or before.
Remember that available units is **NOT** sum of all units purchased more than a year ago. We had to subtract the sold units (sold at any point in time, not just more than a year ago) to arrive at LTCG-eligible units.
In other words, we need to find unit purchase price for only these units. Not that of all units.
Another way to look at it: _keep adding purchase price, until total number of units added is same as LTCG-eligible units_.
---
#### An Example
An example to illustrate this idea:
Say, an investor is buying 10 units every month, from 2018 January, to 2020 December.
At the end of December 2020, 12 * 10 = 120 units purchased between January 2018 and December 2018, are eligible for LTCG.
Now assume that in 2020, at some point, this investor has also sold 54 units.
Then, at that point, they've (120 - 54) = 66 units available for LTCG-eligible redemption.
What were the purchase price(s) for these 46 units?
More importantly, when were _these_ 66 units acquired?
We know that investor purchased 10 units every month, from Jan 2018 to Dec 2020. Investor has also sold 54 units some time in 2020.
Redemption works in a FIFO (**F**irst **I**n **F**irst **O**ut) manner in folio. Meaning, earliest purchased units are to be redeemed first.
We first need to account for 54 units that were sold. These units were acquired as follows -
| Month of Purchase | Number of units |
| -------- | -------- |
| January 2018 | 10 |
| February 2018 | 10 |
| March 2018 | 10 |
| April 2018 | 10 |
| May 2018 | 10 |
| June 2018 | 4 |
Once we've depleted the _earliest-purchased-earliest-sold_ scheme for already redeemed units, we can start to account for remaining 46 units.
These 66 LTCG-eligible units were acquired as follows:
| Month of Purchase | Number of units |
| -------- | -------- |
| June 2018 | 6 |
| July 2018 | 10 |
| August 2018 | 10 |
| September 2018 | 10 |
| October 2018 | 10 |
| November 2018 | 10 |
| December 2018 | 10 |
When computing purchase price for these 66 units, we need to use purchase price from these months; and not purchase price from January 2018.
In other words, **this is going to be the most complex part of the computation**.
To recap, we compute purchase cost / acquisition cost as follows:
- Account for existing sell transactions, depleting units, starting from initial purchase transaction; until total units add up to all units sold so far.
- Start adding up units from this date, till it adds up to total available LTCG-eligible units.
---
#### A Special Case
Luckily, we have a special scenario on our hands.
In our case, if you notice the units sold and LTCG-eligible units in the column, investor has either never sold any units from their current holdings in a specific fund; or they've completely sold off all their units in that specific fund.
Which means, for funds where there are LTCG eligible units; we **do NOT need to account for previous sell transactions**.
In other words, one can start adding up the purchase transaction costs from the very first purchase.
This won't be the case for a typical CSV transaction dumps for most investors.
### Grandfathering LTCG Computations
We've so far carefully avoided discussing the _grandfathering_ clause in equity LTCG computation.
Union budget 2018, defined purchase price of equity, for LTCG computation as higher of price _as on_ 31st January 2018, or actual purchase price - if units were bought before this date.
To factor this in, we'd need information on NAV price as on 31st January, for each of the funds present in the CSV transaction dump, which has LTCG-eligible units.
The AMFI URL for latest NAV won't do.
Instead, we can use this URL for fetching historic data: `http://portal.amfiindia.com/DownloadNAVHistoryReport_Po.aspx?mf=<AMC Code>&tp=1&frmdt=31-Jan-2018&todt=31-Jan-2018`
This URL returns data in CSV format as before. However, it's specific to each AMC, identified by an AMC code.
An AMC code is a number, between 1 and 100. However, not all numbers between 1 and 100 correspond to a valid AMC code. You'd have to try out each number, and see which ones result in valid URLs that return data.
For example, Nippon AMC is identified by AMC code 21. This URL would return NAV history of all Nippon AMC funds, as on 31st January 2018: `http://portal.amfiindia.com/DownloadNAVHistoryReport_Po.aspx?mf=21&tp=1&frmdt=31-Jan-2018&todt=31-Jan-2018`
You can follow similar steps as earlier to incorporate data from this URL into your calculations.
---
# Wrapping Up
We won't be covering this calculation here, as it'd be quite complex.
It's left as an exercise to the reader; as it won't be of much value to do these with Excel / Spreadsheet, for most investors.
To achieve such conditional summing-up, one would most likely need to use `QUERY()` function, writing something akin to SQL queries.
It'd also be much harder to update or understand. for another person going through same excel sheet or spreadsheet; which would have such esoteric constructs.
At that point, using a for-loop equivalent in commonly used programming languages (Python / Node / Java / Ruby / Go etc.) would be much easier to reason about.
Another option would be to use GApps Script, in Google Sheets; which is basically JavaScript to program your sheets. However, we won't be covering that here.