Muhammad Adib Zamzam
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.

      Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Explore these features while you wait
      Complete general settings
      Bookmark and like published notes
      Write a few more notes
      Complete general settings
      Write a few more notes
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.

    Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Explore these features while you wait
    Complete general settings
    Bookmark and like published notes
    Write a few more notes
    Complete general settings
    Write a few more notes
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # ChatGPT - sums #2 https://hackmd.io/J19mcY3mSpWmjvZiUerAjw [TOC] ## overview Kemungkinan penyebab: 1. Data Quality: ? 2. Model Architecture: BART / indobartv2 / MBART 3. Hyperparameters: 1. learning rate : 5e-5 2. batch size: 16 3. epoch: 70/120 4. Training Data Size: 15k, Test data: 3k 5. Regularization: ### ai transformer An AI transformer is a type of neural network architecture used in natural language processing (NLP) tasks, such as machine translation, text classification, and question answering. The transformer architecture was first introduced in 2017 in a paper called "Attention is All You Need" by Vaswani et al. The transformer architecture is based on the self-attention mechanism, which allows the model to attend to different parts of the input sequence to generate the output. Unlike traditional sequence-to-sequence models that use recurrent neural networks (RNNs) or convolutional neural networks (CNNs), the transformer architecture does not rely on sequential processing of the input, making it faster and more parallelizable. The transformer architecture consists of an encoder and a decoder. The encoder processes the input sequence and generates a fixed-length vector representation, while the decoder takes the encoder's output and generates the output sequence. Both the encoder and decoder use multiple layers of self-attention and feedforward neural networks. The transformer architecture has shown impressive results in many NLP tasks, and its popularity has led to the development of several pre-trained transformer-based language models, such as BERT, GPT-2, and T5. These pre-trained models can be fine-tuned on specific tasks with small amounts of task-specific data, making them a popular choice for NLP applications. ## Muhammad Adib Zamzam I have problem when generating summarization. Many sequence & token is not produced properly. what i am doing wrong? ### ans There could be several reasons why your summarization model is not producing proper sequences and tokens. Here are a few things to consider: Data Quality: The quality of your training data could be affecting the performance of your model. If the training data is noisy or incomplete, the model may not learn the correct patterns and structures needed for proper summarization. Model Architecture: The architecture of your summarization model could also be a factor. Make sure that you are using an appropriate model architecture that is capable of capturing the necessary nuances of the input data. You may want to consider using a pre-trained transformer-based language model, which can provide better results than traditional models. Hyperparameters: The hyperparameters of your model, such as the learning rate, batch size, and number of training epochs, can also impact the performance of your model. Try experimenting with different hyperparameters to see if that improves the output. Training Data Size: The amount of training data that you are using could also be a factor. If you are using a small amount of data, the model may not be able to learn the necessary patterns and structures. Consider increasing the size of your training data. Evaluation Metrics: Make sure that you are using appropriate evaluation metrics to assess the performance of your model. For example, the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metrics are commonly used to evaluate the quality of summarization outputs. Pre-processing: Ensure that the input text is properly pre-processed. This can include tasks such as tokenization, stemming, and stop word removal. Proper pre-processing can help to improve the quality of the summarization output. Regularization: Use regularization techniques such as dropout, early stopping, and weight decay to prevent overfitting and improve the generalization performance of your model. I hope this helps! ## ans ## Muhammad Adib Zamzam ### myspec My spec, please evaluate 1. Data Quality: moderate 2. Model Architecture: BART - indobartv2 - especially MBART 3. Hyperparameters: 1. learning rate : 5e-5 2. batch size: 16 3. epoch: 70/120 4. Training Data Size: 15k, Test data: 3k 5. Evaluation Metrics: ROUGE. 6. Pre-processing: I dont think stemming, and stop word removal will have contribution to transformer learning at summarization. 7. Regularization: 1. dropout 2. early stopping 3. weight decay ### ans Based on the information you have provided, here is my evaluation of your summarization model: Data Quality: Moderate quality data may have some noise and errors that can negatively impact the performance of your summarization model. However, if the data is representative of the target domain, it can still provide useful insights. Model Architecture: BART and Indobartv2 are well-known transformer-based models that have been shown to provide excellent results for summarization tasks. Mbart is a multilingual version of the BART model that is specifically trained for low-resource languages, which can be helpful for summarizing Indonesian language text. Hyperparameters: The hyperparameters you have chosen appear reasonable. A learning rate of 5e-5 and batch size of 16 are commonly used values for transformer-based models. However, the optimal number of epochs can vary depending on the dataset and task, so you may need to experiment with different values to find the best performance. Training Data Size: The size of your training data is relatively small, which can be a limitation. However, if the data is representative of the target domain and contains diverse examples, it may still be sufficient to train a good model. Evaluation Metrics: ROUGE is a commonly used evaluation metric for summarization tasks, and it is a good choice for measuring the quality of the generated summaries. Pre-processing: While stemming and stop word removal may not have a significant impact on transformer-based models, it is still important to perform basic text pre-processing to ensure that the input data is properly formatted and cleaned. Regularization: Regularization techniques such as dropout, early stopping, and weight decay are commonly used to prevent overfitting and improve the generalization performance of your model. It is good to see that you have included these techniques in your training process. Overall, based on the information you have provided, your model architecture and hyperparameters appear to be reasonable. However, the quality and size of your training data could be potential limitations. I would recommend conducting a thorough evaluation of your model using different metrics and validation techniques to ensure that it is performing well on your target task. ## Muhammad Adib Zamzam The epoch of more than 72 give me flat training loss and eval loss.. so i think it is an optimal epoch count ### ans If your model is achieving a flat training and evaluation loss after 72 epochs, it is possible that your model has converged and additional training may not improve the performance further. In such cases, it is generally recommended to stop training and use the model with the best performance on the validation set. However, it is important to note that in some cases, stopping training too early can lead to suboptimal performance. If you suspect that the model could benefit from additional training, you can try increasing the number of epochs or adjusting the learning rate to see if that improves the performance. Additionally, you can try other regularization techniques like adding noise or applying different types of dropout, which can help to prevent overfitting and improve the generalization performance of your model. ## Muhammad Adib Zamzam should i apply regularization from before the finetuning or after the finetuning? because i already have the 72 epoch checkpoint ### ans Regularization techniques like dropout, early stopping, and weight decay are typically applied during the training process, and they can be applied either before or after fine-tuning. If you have already fine-tuned your model for 72 epochs and have a checkpoint, you can apply regularization techniques to that checkpoint and continue training with those techniques. If you are concerned about overfitting and want to prevent the model from memorizing the training data too closely, it may be helpful to apply regularization techniques early in the training process. However, if your model is already performing well on the validation set and you don't see evidence of overfitting, it may not be necessary to apply additional regularization techniques. In any case, it is important to monitor the performance of your model during training and evaluate the impact of any changes you make. You can do this by tracking metrics like training and validation loss, and by using evaluation metrics like ROUGE to assess the quality of the generated summaries. ## Muhammad Adib Zamzam what is dropout actually do? ### ans Dropout is a regularization technique commonly used in deep learning to prevent overfitting. It works by randomly dropping out (i.e., setting to zero) some of the neurons in a neural network during training. During training, dropout randomly sets a fraction of the input units of a layer to zero at each update. This means that each neuron in the layer has a probability of being dropped out at each training iteration, and the dropout rate is typically set between 0.2 and 0.5. By randomly dropping out neurons during training, dropout can help prevent overfitting by forcing the network to learn redundant representations for the same input. This makes the network more robust and better able to generalize to new inputs. During inference, dropout is turned off, and the full network is used to make predictions. This means that the predictions are made using the average output of the network over many different possible configurations of the neurons, which can help to reduce the impact of noisy or irrelevant inputs. Overall, dropout is a powerful regularization technique that can help prevent overfitting and improve the generalization performance of deep neural networks. ## Muhammad Adib Zamzam retraining cost me money, what if the problem is at the data and not at the regularization? ### ans Retraining a model can be expensive in terms of both time and resources, especially if you are working with large datasets and complex models. If you suspect that the problem with your model is related to the data rather than the regularization techniques, there are several steps you can take to improve the quality of your data and potentially avoid the need for retraining. First, you can try to collect more data or augment your existing data to increase the variety and complexity of your training set. This can help the model learn more robust and generalizable representations of the data and reduce the impact of noisy or irrelevant inputs. Second, you can try to preprocess your data to remove any outliers or errors that may be affecting the performance of your model. This can include techniques like outlier detection, data cleaning, and feature engineering. Finally, you can try to use different evaluation metrics to assess the quality of your model and identify areas where it may be underperforming. This can help you to pinpoint specific issues with the data or the model architecture and make targeted improvements. Overall, improving the quality of your data can be an effective way to address problems with your model and potentially avoid the need for costly retraining. However, it is important to carefully evaluate the performance of your model and experiment with different techniques to determine the best approach for your specific problem. ## Muhammad Adib Zamzam what is the ways to improve summarization data at minimal cost ### ans Improving the quality of your data is crucial for achieving better summarization results, and there are several ways to do this without incurring significant costs. Here are a few suggestions: Data Cleaning: One way to improve the quality of your summarization data is to perform data cleaning. This involves removing any unnecessary or irrelevant information from the dataset, correcting any errors or inconsistencies, and making sure that the data is properly formatted. Data cleaning can help to reduce noise and improve the accuracy of your summarization model. Data Augmentation: Another way to improve the quality of your summarization data is to perform data augmentation. This involves creating new training examples from existing data by applying techniques like paraphrasing, back-translation, or text insertion. Data augmentation can help to increase the diversity and complexity of your training set, and improve the generalization performance of your model. Transfer Learning: Transfer learning involves leveraging pre-trained language models like BERT or GPT to fine-tune a model on your summarization task. This can help to improve the performance of your model while reducing the amount of training data required. For instance, you can use a pre-trained model like BART as a starting point and fine-tune it on your summarization task. Data Annotation: Adding annotations to your summarization data can help to improve the quality of your model. Annotations can include summaries written by humans, as well as labels for different aspects of the text (e.g., entities, sentiment, etc.). Annotations can be expensive, but there are many crowdsourcing platforms available that can make the process more affordable. Domain-specific training: Training your model on domain-specific data, such as news articles, scientific papers, or legal documents, can help to improve the quality of your summarization results. This is because domain-specific data often has unique characteristics and language patterns that are specific to that domain. Overall, there are many ways to improve the quality of your summarization data without incurring significant costs. By leveraging techniques like data cleaning, data augmentation, transfer learning, data annotation, and domain-specific training, you can improve the performance of your summarization model and generate more accurate and informative summaries.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password
    or
    Sign in via Google Sign in via Facebook Sign in via X(Twitter) Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    By signing in, you agree to our terms of service.

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully