# ESG OPTISOLVE ASSIGNMENT 1 ## Technical Interview Assignment for ML Engineer Role: ESG Optisolve ### Task Overview: You are required to design and implement a small-scale prototype that demonstrates your skills in machine learning, specifically focusing on data collection, Natural Language Processing (NLP), and Large Language Models (LLMs) like OpenAI. The prototype should be relevant to the functionalities of ESG Optisolve, which deals with ESG data collection, management, and analysis. ### Assignment: ESG Data Aggregation and Sentiment Analysis Prototype **Objective**: Develop a prototype that collects ESG-related news articles from online sources and performs basic sentiment analysis on the collected data. This prototype should give us insight into your ability to work with data collection, NLP, and sentiment analysis - key skills for the ML Engineer role in ESG Optisolve. ### Task Specifications: **Data Collection:** Write a script to collect ESG-related news articles from freely available online news APIs or web scraping. Ensure the script can collect a minimum of 50 recent articles. Data Preprocessing: Clean and preprocess the collected data for NLP tasks. Include steps like tokenization, removing stop words, and any other necessary preprocessing steps. Sentiment Analysis: Implement a basic sentiment analysis on the collected articles. You can use pre-built NLP libraries like NLTK, TextBlob, or any other of your choice. The analysis should categorize each article as having a positive, negative, or neutral sentiment. Results Presentation: Summarize the results in a simple manner showing the distribution of sentiments across the collected articles. ### Deliverables: **Code:** Well-documented source code for the script and analysis. The code should be submitted in a clean, readable format, preferably in a Python notebook (Jupyter/Google Colab). **Report:** A brief report explaining your methodology, tools used, and any assumptions made. Include a summary of the sentiment analysis results and any interesting insights observed. **Evaluation Criteria:** * Correctness and Efficiency of Data Collection Script. * Quality and Clarity of Data Preprocessing Steps. * Accuracy and Implementation Approach of Sentiment Analysis. * Overall Code Quality and Documentation. * Insightfulness of the Summary Report. * Time to Complete: 3-4 hours ### Submission Instructions: Submit the completed assignment within 5 days from the assignment date. Share the code via a GitHub repository link or a shared Google Colab notebook. Include the report as a PDF or in the notebook itself. **Notes:** * This assignment is designed to assess basic skills in data collection, NLP, and sentiment analysis. * Focus on demonstrating clear coding practices, basic understanding of NLP tasks, and the ability to draw simple insights from data analysis. * It’s more important to have a working prototype with clear documentation than to have an advanced but incomplete solution. Questions can be sent to p20230809@hyderabad.bits-pilani.ac.in