# Creating datasets! <iframe src="https://giphy.com/embed/4rL0k8t2mSmWbzO8fl" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/Similarweb-may-the-4th-be-with-you-similarweb-shelly-skandrani-4rL0k8t2mSmWbzO8fl"></a></p> --- ### Get your folders ready! **Remember:** - You must have between 50-100 images - At most 3 different categories: each in its own folder --- ### Open VSCode 1. Create a folder for Week 8 2. Bring in your images folder 3. Create a jupyter file `creating_dataset.ipynb` --- ### Let's run some code Follow the instructions! --- ### Manual Labeling: If you have a small set of images, you can manually label them by assigning each image a category or caption. This can be done using a spreadsheet or a text editor. --- ### What is json dataset? JSON stands for JavaScript Object Notation - a popular way of organizing data so that it's easy to read, share, and use. --- In a **JSON dataset**, the data is stored as key-value pairs, similar to a dictionary in real life where you can look up a word (the key) and find its definition (the value). --- ```typescript { "toys": [ { "name": "Action Figure", "price": 10, "category": "Action Figures" }, { "name": "Doll", "price": 15, "category": "Dolls" }, { "name": "Stuffed Animal", "price": 5, "category": "Stuffed Animals" } ] } ``` --- the keys are "name", "price", and "category", and their values are the specific details about each toy. These key-value pairs are surrounded by curly braces `{}` and separated by commas. If there are multiple toys (or data entries), they're placed inside square brackets `[]` as a JSON array. --- By using JSON datasets, we can easily share and access information, just like the toy store owner can quickly find the price of a specific toy or see how many toys are in a particular category. Pretty cool, isn't it? --- ### On JSON file - To edit a caption: 1. Locate the Caption key of the image entry you want to update. 2. Change the text between the quotation marks following the Caption key. --- **Make sure to maintain the JSON format:** - Strings should be in quotes. - Objects (key-value pairs) should be contained in curly braces {}. - Arrays (lists of items) should be contained in square brackets []. - Separate items in objects or arrays with commas. --- ### Validate JSON Syntax After editing, it's crucial to validate the JSON to ensure there are no syntax errors. You can use online validators such as JSONLint or built-in functions in your text editor. For instance, in VS Code, you can install a JSON linter extension that will highlight errors. --- ### Save Your Changes Once you have finished editing and validated your JSON syntax, save your changes. Use File > Save in your editor to overwrite the original file or File > Save As... to create a new file if you want to keep the original intact. --- ### Pros of Manually Editing JSON Files 1. **Control:** Manual editing provides complete control over the changes, allowing precise adjustments to data entries. <iframe src="https://giphy.com/embed/3osxYc2axjCJNsCXyE" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/animation-ryan-seslow-3osxYc2axjCJNsCXyE"></a></p> --- 2. **Simplicity:** For small changes, manual editing can be straightforward and faster than writing a script, especially if you're familiar with the data structure. <iframe src="https://giphy.com/embed/aWXSKuhUPFPdLQoEqt" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/punchvisual-postproduction-fix-it-in-the-post-fixitinthepost-aWXSKuhUPFPdLQoEqt"></a></p> --- 3. **No Tool Dependency:** Manually editing doesn't require additional tools or programming skills beyond a basic text editor and knowledge of JSON syntax. <iframe src="https://giphy.com/embed/3oKIPqsXYcdjcBcXL2" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/animation-stopmotion-tools-3oKIPqsXYcdjcBcXL2"></a></p> --- ### Cons of Manually Editing JSON Files 1. **Error-Prone:** Manual editing increases the risk of introducing syntax errors, such as missing commas or misformatted braces, which can corrupt the JSON file. <iframe src="https://giphy.com/embed/mq5y2jHRCAqMo" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/windows-vaporwave-error-mq5y2jHRCAqMo"></a></p> --- 2. **Scalability:** It is not scalable. As the size of the data grows or changes become more frequent, manual editing can become tedious and impractical. <iframe src="https://giphy.com/embed/o2Ps6kuAwu40oAkBmY" width="480" height="318" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/MonkexNFT-power-metis-metisians-o2Ps6kuAwu40oAkBmY"></a></p> --- 3. **Lack of Automation:** There’s no easy way to replicate manual edits across multiple files or entries without repeating the entire process, which lacks efficiency. <iframe src="https://giphy.com/embed/LqJ0TeiE0k3Qho2DRO" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/no-bots-spambots-spambot-bugger-off-LqJ0TeiE0k3Qho2DRO"></a></p> --- 4. **No Audit Trail:** Manual editing does not provide an inherent log of changes or revisions, making it difficult to track history or revert to previous versions unless using version control. <iframe src="https://giphy.com/embed/d83cKe2sejxzOiDn4U" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/TransparencyInternational-audit-transparency-international-corruptionary-d83cKe2sejxzOiDn4U"></a></p> --- 5. **Data Integrity:** There is a greater risk of human error, potentially leading to data inconsistency or loss. ![data-integrity](https://hackmd.io/_uploads/HkwerTLZC.jpg) --- ### Best Practices for Handling JSON Data 1. **Automate with Scripts:** Use scripts for repetitive tasks or bulk updates. Languages like Python can load, modify, and save JSON data while handling errors more gracefully. <iframe src="https://giphy.com/embed/l1KtYG8BndKBmWrM4" width="480" height="360" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/spongebob-season-4-spongebob-squarepants-l1KtYG8BndKBmWrM4"></a></p> --- 2. **Version Control:** Always keep your JSON files under version control (e.g., Git). This provides a history of changes, allows for easy reversion to previous versions, and supports collaboration. :bulb: <iframe src="https://giphy.com/embed/kH6CqYiquZawmU1HI6" width="480" height="220" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/devrock-code-edr-escueladevrock-kH6CqYiquZawmU1HI6"></a></p> --- 3. **Backup:** Regularly back up your data, especially before making manual changes, to prevent data loss. ![backup](https://hackmd.io/_uploads/SksyKaL-0.jpg) --- 4. **Use a Database:** For complex or large datasets, consider using a database management system. For JSON-like structures, NoSQL databases like MongoDB can be more appropriate and feature facilities for indexing, querying, and more robust data integrity. <iframe src="https://giphy.com/embed/vISmwpBJUNYzukTnVx" width="480" height="251" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/vISmwpBJUNYzukTnVx"></a></p> --- ### Mosaic Generator **Let's generate a mosaic from our JSON dataset** --- # Image processing ![pillow](https://hackmd.io/_uploads/rJZl6T8-0.jpg) --- ### Pillow (PIL Fork) * PIL stands for Python Image Library and Pillow is the friendly PIL fork * It supports a wide range of image formats like PPM, JPEG, TIFF, GIF, PNG, and BMP. * It can help you perform several operations on images like rotating, resizing, cropping, grayscaling etc. --- # Images define the world ...each image has its own story, it contains a lot of crucial information that can be useful in many ways. This information can be obtained with the help of the technique known as Image Processing. <iframe src="https://giphy.com/embed/w6bXcyD53Kh5CFjhsY" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/art-geometry-code-w6bXcyD53Kh5CFjhsY"></a></p> --- It is the core part of **computer vision** which plays a crucial role in many real-world examples like **robotics**, self-driving cars, and **object detection**. <iframe src="https://giphy.com/embed/es40jyb1I3JkCvt31a" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/art-augmented-reality-mobart-es40jyb1I3JkCvt31a"></a></p> --- ## What is image processing? As the name says, image processing means processing the image and this may include many different techniques until we reach our goal. The final output can be either in the form of an image or a corresponding feature of that image. This can be used for further analysis and decision making. --- ### Image Processing Operations #### Let's solarise our images 1. Choose a folder ``` python im = ImageOps.solarize(im0, 128) im = ImageOps.posterize(im0, 1) ``` --- ### Let's do some other processing operations and create some art!
{"title":"Creating datasets","description":"View the slide with \"Slide Mode\".","contributors":"[{\"id\":\"144251f5-10fa-4492-aada-15c1b1857887\",\"add\":12252,\"del\":2520}]"}
    205 views