Sharat has joined IndicNLP project as of yesterday. We can go ahead and plan the next steps. Shall we start off with going through the datasets and formulating one or more smaller problem statements?
June 3rd Minutes of Meeting (IIITB team meets Janastu team)
https://open.janastu.org/daily/minutes/2021/june2021#03-june-2021 (see links referred)
** 3 repositories to share: **
@mani to help oversee sharing with Sharath online repos from harddrives
Anthillhacks 2015 and Nam Halli Radio 2019-2020
[ ] da.pantoto.org and dash.swtr.us and papad
[ ] scrape namdu1radio.com
http://da.pantoto.org/
http://dash.swtr.us/#home
https://www.namdu1radio.com/
http://papad.pantoto.org/
Nam Halli Radio - Covid Campain Audio
https://drive.google.com/drive/folders/1MC4A00umYHqRA9nF13W-eM3Ro-H0DAWT?usp=sharing
Nam Halli Radio - Garikur Radio Neelgiris
https://drive.google.com/drive/folders/1cjD9hD1RvJYNjZOdWr_SpoY9goO1ynrS?usp=sharing
Nam Halli Radio - Indoansia radio files
https://drive.google.com/drive/folders/1dacPFP6VfN-9U_Q-GVdr-CQj1OEieyTV?usp=sharing
Nam Halli Radio covid19 campaign
https://files.janastu.org/s/KbGd5RJGswsNna5
AnthillHacks2019 audio https://drive.google.com/drive/folders/1LtflOdWet3RrAEonzh9b2bK5XnCeMZog?usp=sharing
Mirzapur girls activity data (gig enabling)
[ ] provide links and fragment tagging sheets @shafali @madhu ?
[ ] Nextcloud data related to this
humani pehchan recording
https://docs.google.com/document/d/1K1HBTag8QP7Gc8cGeGdw5_1OG0ltWqOSUxCRLAbiXlk/edit
mirzapur- Papad audio annotations data
https://docs.google.com/spreadsheets/d/1aHN2NAQLKTQnlrDUnn7Ri6VkOR-RnDtNbALNfEqeB1k/edit?usp=sharing
Village Diaries, Songs, Stories, Videos
https://docs.google.com/document/d/1AgdW-T3d08GsChkdJb1dBqFT-orGCbpsvIV67uyHoVg/edit?usp=sharing
Mirzapur-Activity
https://files.janastu.org/s/6sJQT2Nrd95pcAy
Mirzapur Girls: humare game plan https://docs.google.com/document/d/1u5U7fQft4OhIhEgqy1j1LSCNGXY2jJj4ISQAiCALc2g/edit
Content Collected from GG
https://docs.google.com/document/d/1AgdW-T3d08GsChkdJb1dBqFT-orGCbpsvIV67uyHoVg/edit
Humanre Sapne https://docs.google.com/document/d/1wq1TOcJ4khVgJ26GbdycQrgz6AnHSN0SZ95zUCUvgmM/edit
Drive link https://drive.google.com/drive/u/0/folders/1BEaaE5rfIWPac9bQEGKnSXLBnc2xOBSH
[ ] pointers to oral histories (see sandbox)
13ways-sandbox-oral-history: http://stories.archives.ncbs.res.in/exhibit/13ways/#/theme/sandbox/oral-history
(Sharath had a discussion with Prof Ram students who would help us in ASR.)
Following are the points from discussion:
To improve the accuracy of conversion following is planned:
Sharath will share the corpus with points 1&2 by 6th July.
(Indic NLP Dataset and Concepts Discussion with Sharath, Dinesh & Mani)
Janastu works can be broad classified as:
This is achieved by Alipi and SWeets. Following is the explanation of how it works: For given content the renarrations are compiled by annotations or SWeets for sections of the content. SWeets are semantic annontations for sections of text written in languages like English. SWeets have target languages. Alipi is a platform to aggregate SWeets and compile renarrations. Alipi aggregates all SWeets of target language for a content and creates a renarration in the target language. The SWeets can be text, audio or video contents in the target language. SWeets scheme was replaced by W3C Annotations Scheme.
The scope of the project "IIITB Janastu Mphasis on Intentional Audio Networks" is to link the audio contents. We discussed that, we would process the audio content, interlink audio contents based on semantics, n-grams, etc. The interlinking of audios is open problem and Sharath would be working on it. For initial analysis and study we have picked "Nam Halli Radio - COVID Campaign Audio" corpus (https://drive.google.com/drive/folders/1MC4A00umYHqRA9nF13W-eM3Ro-H0DAWT?usp=sharing)
Indic NLP Technical Discussion (Prof. Sridhar & Sharath)
or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing