Hamze GHALEBI

@Hamze

'm a causal machine learning fanatic and social systems junkie. Follow me as I use data to unravel the complexities of human behavior.

Prime membership

Joined on Dec 1, 2022

  • hackmd-github-sync-badge Rust’s strengths in performance, safety, and concurrency are gaining ground in the demanding realm of AI and Large Language Models (LLMs). Once a niche language in this space, Rust now supports a vibrant and growing ecosystem of machine learning tools—from lean inference engines to robust vector database clients. However, this rapid expansion can make it difficult to pinpoint the right crate for your needs—whether you're deploying models on edge devices, fine-tuning transformers, or managing prompt flows via API. This article offers a structured reference to guide your selection. We've categorized key Rust libraries across functional domains, including: Inference Engines: Candle – A minimalist ML framework supporting fast CPU/GPU inference with models like Transformers and Whisper. Backed by Hugging Face.
     Like 1 Bookmark
  • hackmd-github-sync-badge A Coder's Guide to the Official Rust MCP Toolkit (rmcp) Hey there! Ever wondered how super-smart computer programs, like the AI assistants you might chat with, can use other tools or get information from different places? Like, how does an AI know the weather, or how can it use a calculator? That's where something called the Model Context Protocol (MCP) comes in. Let's break it down! I. What is this MCP Thing? Imagine you have a super-smart AI, called a Large Language Model (LLM) – think of things like ChatGPT or Claude. These AIs are great at understanding and generating text, but they often live inside their own "digital brain." Now, imagine you want this AI to do something specific, like look up the latest score for your favorite sports team or use a special calculator you built. The AI needs a way to talk to the outside world – other computer programs, databases (where data is stored), or tools.
     Like 2 Bookmark
  • List of Repositories Here are 10 examples of GitHub repositories using the Rust SDK for MCP, based on available information: cyberelf/mcp_rustdoc - An MCP server for querying Rust API documentation from docs.rs. Govcraft/rust-docs-mcp-server - Provides up-to-date documentation context for Rust crates to LLMs. pinecone-io/assistant-mcp - Connects to Pinecone Assistant for context from its knowledge engine. nwiizo/tfmcp - A Terraform MCP server for managing Terraform environments. sapientpants/sonarqube-mcp-server - Integrates with SonarQube for code quality metrics and issues. chaindead/telegram-mcp - Likely integrates MCP with Telegram for communication purposes. metoro-io/metoro-mcp-server - An MCP server focused on monitoring applications.
     Like 1 Bookmark
  • 🦀 MCP in Rust: A Practical Guide using rmcp This guide demonstrates how to use the Model Context Protocol (MCP) in Rust with the rmcp crate. MCP allows AI systems to interact with tools via JSON-RPC 2.0. Rust's performance and safety make it a great choice for building these tools. We'll build several example MCP servers, from simple file operations to more complex KYC tasks. 1. Environment Setup First, add the necessary dependencies to your Cargo.toml:
     Like 1 Bookmark
  • I. Executive Summary: Optical Character Recognition (OCR) on low-quality PDF documents poses significant challenges due to noise, blur, and low resolution. This report analyzes various OCR applications and libraries in Rust, Python, and as command-line tools for such documents. While traditional engines like Tesseract need substantial preprocessing, deep learning-based solutions like EasyOCR and docTR in Python show promise. OCRmyPDF, a Python tool leveraging Tesseract with image processing and optimization, is also a strong contender. The report details performance, accuracy claims, and usability, offering recommendations for optimizing OCR on degraded PDFs. II. Introduction: Optical Character Recognition (OCR) is vital for extracting text from images, scanned documents, and PDFs into machine-readable text, crucial for archiving, automation, and analysis. Low-quality PDFs present obstacles like noise, blur, low resolution, skewing, and compression artifacts [1]. Older documents (e.g., newspapers) with small fonts, dense columns, and background clutter are particularly challenging [1]. Manual correction of errors from standard OCR can be more time-consuming than retyping [1]. Layout complexities, such as newspaper column separators, can confuse OCR engines [1]. The initial scan quality is key to OCR accuracy, and post-processing has limited effectiveness on severely degraded inputs [2]. Solutions are needed that are accurate, fast, and can handle degradation in low-quality PDFs. This report compares high-performance OCR options in Rust, Python, and as command-line applications. III. Rust-Based OCR Solutions: A. ocrs: ocrs is a new, open-source OCR engine in Rust, emphasizing user-friendliness and cross-platform compatibility [21]. It aims for accurate text extraction from various images with minimal preprocessing using machine learning [21]. Currently in early preview, it primarily supports the Latin alphabet (e.g., English) [21], with plans for more languages [21]. Its architecture uses neural networks trained with PyTorch, exported to ONNX, and run with the RTen inference engine [21]. Available as a Rust library and a CLI tool [21], the CLI offers basic OCR, JSON output with layout, and image annotation [21]. Building in release mode is crucial for performance [23]. While promising due to its ML approach, its early stage and limited language support might restrict its immediate use for all low-quality PDFs, especially those with non-Latin scripts [21]. Further performance evaluation on diverse low-quality PDFs is needed.
     Like  Bookmark
  • Here are the top recommendations for highly performant OCR tools suitable for processing low-quality PDFs, categorized by language/framework and command-line options, with an emphasis on accuracy and efficiency: 1. Python-Based Solutions a. Marker (Command-Line & Python) Description: A high-performance open-source tool optimized for converting PDFs (including low-quality scans) into structured formats like Markdown/JSON. It uses surya OCR (a modern engine) and optionally integrates LLMs (e.g., Gemini) to enhance accuracy. Features:Handles multi-column layouts, tables, equations, and damaged text. GPU acceleration for faster processing (supports H100, MPS, or CPU). Built-in preprocessing (e.g., deskewing, noise removal) tailored for low-quality documents. Installation:
     Like  Bookmark
  • Introduction Extracting text from low-quality French-language PDFs is challenging due to issues like blurry scans, noise, and the presence of diacritics (accents) in French text. OCR (Optical Character Recognition) tools must be accurate in recognizing French characters and words, robust against image noise, and efficient to handle large documents. This report identifies leading OCR solutions implemented in Rust or Python or available as command-line tools, focusing on those known for high accuracy on French text even when the input scans are degraded. We prioritize open-source tools, but also include a few exceptional commercial OCR engines known for top-tier performance. Key factors considered include: Recognition accuracy for French (including correct handling of accents and French-specific spelling/grammar) Robustness to poor image quality (blurry text, noise, compression artifacts, skewed pages, etc.) Performance and scalability (speed of OCR and ability to process many pages or use hardware acceleration) Usability from the command line or via APIs (Rust crates or Python libraries) for integration into workflows Output formats supported (plain text extraction, searchable PDF output, or structured layouts like hOCR/ALTO XML) Below, we explore each tool in detail, then provide a comparison table and usage examples.
     Like  Bookmark
  • Hello there! Welcome to GUI (Graphical User Interface) development with Rust and the egui library. I'm Gemini, your guide. Think of me as a friendly teacher who's helped many beginners like you get started with Rust. We'll take it slow and steady. Don't worry if things seem new – we'll break them down. Let's begin! I. Introduction to egui: Your Friendly GUI Toolkit What's a GUI? It's the visual part of an app – buttons, text boxes, sliders – things you click and interact with.
     Like  Bookmark
  • Hello there! Welcome to your first steps into the exciting world of Graphical User Interface (GUI) development with Rust, using the wonderful egui library. My name is Gemini, and I'll be your guide. Think of me as someone who's been teaching Rust for a good couple of decades – I've seen it all, and my main goal is to make this journey clear, engaging, and successful for you. We'll take it step-by-step. Don't worry if some concepts seem new; we'll break them down. Remember, every expert was once a beginner! This tutorial uses the excellent structure and information you provided to get you started. A Note on Structure: We'll follow a clear path from introduction to more advanced topics. I'll use headings, lists, and bold text to keep things organized, which can be helpful for staying focused. Let's dive in! I. Introduction to egui: Your Friendly GUI Toolkit So, what exactly is egui?
     Like  Bookmark
  • Resourece for learning renforcment learning will be at to this document gradually. Markov Decision Processes 1 - Value Iteration | Stanford CS221: AI (Autumn 2019) https://www.youtube.com/watch?v=9g32v7bK3Co&t=9s An Intro to Markov chains with Python! https://www.youtube.com/watch?v=WT6jI8UgROI
     Like  Bookmark
  • Note: This idea is in its nascent stages and not yet fully mature. I am actively seeking advice to further articulate and refine this concept. Abstract One of the core challenges in crafting inclusive financial products lies in the fixed costs associated with training human expertise, particularly for highly regulated and complex fields such as banking. This proposal aims to investigate the feasibility of using Language Learning Models (LLMs) with multi-agent capabilities to simulate a "virtual company" that can adapt to niche requirements, thereby driving down the costs of specialized knowledge and enabling more inclusive financial solutions. Challenges Expertise Training: Fixed costs in training human expertise in specialized domains hinder the scalability of inclusive financial services. Complexity & Regulation: Banking is a heavily regulated field that also entails complex decision-making, making it difficult to offer tailored solutions for minority communities. Theoretical Foundations
     Like  Bookmark
  • For End-Users As a user, I want to contact customer service via email so that I can ask for help. Acceptance Criteria: Provide an easily accessible "Contact Us" button. Enable email form or direct email link. As a user, I want to know when I will receive a response so that I can manage my expectations. Acceptance Criteria: Auto-reply with an estimated response time. Update the user if the initial estimate changes.
     Like  Bookmark
  • Our mission for the second half of 2023 is to build a secure, easy-to-use online banking system. We aim to create a strong foundation for managing user accounts and safeguarding their information. This involves creating a secure login system, a card management feature, and a transparent transaction process. To enhance the user experience, we will integrate customer support and adhere to financial regulations by implementing a KYC system. We've kept flexibility in our plan towards the end of the year to make improvements based on user feedback. In 2024, we aim to grow, potentially expanding to a mobile platform. By 2025, we plan to integrate more partner solutions for a well-rounded financial platform. Our ultimate goal is to simplify digital banking, foster trust and meet our users' evolving needs.
     Like  Bookmark
  • Introduction Au cours de la dernière décennie, les technologies d'intelligence artificielle (IA) ont évolué rapidement. De nombreuses percées commerciales sont dues à l'apprentissage profond, une méthodologie spécifique pour l'apprentissage automatique dans laquelle des réseaux neuronaux complexes sont formés à l'aide de grandes quantités de données (LeCun et al., 2015). L'apprentissage profond a stimulé l'activité dans différents sous-domaines de l'IA tels que le traitement du langage naturel (TLN) avec des grands modèles de langage (Large Language Model ou LLM en anglais), ce qui a donné lieu à des produits commerciaux sophistiqués pour des applications telles que la génération automatique du langage. Dans cette partie, nous mettrons en exergue d'une part la définition et les techniques des grands modèles de langage, et d'autre part les acteurs majeurs de ces modèles et quelques exemples de leurs principaux grands modèles de langage developpés. Définition et techniques des grands modèles de langage Techniquement, un modèle de langage est une représentation statistique d'une langue qui nous indique la probabilité qu'une séquence donnée (un mot, une expression ou une phrase) se produise dans cette langue. Grâce à cette capacité, les modèles de langage peuvent être utilisés pour prédire la suite d'une phrase et, par conséquent, pour générer du texte. Les principales différences entre les grands modèles de langage (GML) et les modèles de langage communs sont que les grands modèles de langage sont entraînés sur des quantités de textes MASSIVEMENT plus importantes avec une puissance de calcul exponentielle.
     Like  Bookmark
  • OpenAI is a research organization that focuses on developing artificial intelligence in a safe and beneficial way. It was founded in 2015 by a group of entrepreneurs including Elon Musk, Sam Altman, Greg Brockman, Ilya Sutskever, John Schulman, and Wojciech Zaremba1. OpenAI has raised a total of $11B in funding over 6 rounds2. As of the end of last year, OpenAI was projecting a loss of more than $508 million for 20223. ChatGPT is an open-source, end-to-end dialogue system built on top of the GPT-3 language model4. It is designed to enable developers to quickly build and deploy conversational AI agents for chatbots4. ChatGPT’s potential to reshape the financial landscape has been discussed in Forbes5. I. Introduction A. Background of ChatGPT GPT-4 architecture: ChatGPT is built on the GPT-4 architecture, a state-of-the-art language model developed by OpenAI. GPT-4 utilizes deep learning techniques and is trained on a vast dataset to generate human-like responses in a conversational manner. The architecture enables the model to understand and process complex language patterns, making it one of the most advanced AI language models available. Capabilities and use cases:
     Like  Bookmark
  • 10 mins video: Introduction: Since I had completed all of my TP sessions using the Things board, I decided to work on the final project using a cloud provider, specifically the Azure IoT Hub. This project aims to develop an architecture integrating IoT devices with the Azure IoT Hub cloud service using the Azure IoT Python SDK. Architecture: The project architecture involves connecting an IoT device, acting as a sensor, and an IoT Edge device to Azure IoT Hub. The IoT device collects light data, then transmitted to the IoT Edge device ubuntu machine. The IoT Edge device is an intermediary and sends the data to the Azure IoT Hub cloud service. Also i used my smartphone az IoT device with sensors. The data is visualized on the Azure IoT Hub dashboard, which allows the user to monitor the data in real-time. Additionally, the user receives a notification on their smartphone when predefined conditions are met. Use Case:
     Like  Bookmark
  • A Computational Journalism Project Meeting Minutes Participants: Hamze, EléonoreDate: 18/02/2023Time: 13:45 - Paris Agenda Launching test for computational journalism Discussion The main idea of the meeting was to find common ground for developing computational journalism and make it more accessible to the public. We discussed the need to choose a subject that we are all passionate about and that is not politically sensitive for the French government. The idea is to make a project that is not only about technology but also about content. After some discussion, we decided to test the idea with a subject that meets all these criteria, which is the extreme right wing in France and their position toward Russia.
     Like 1 Bookmark
  • Graph thinking Exploring the World of Computational Propaganda and Graph Models Welcome to my new newsletter where I share my thoughts, experiences, and reflections on various topics. This edition focuses on my recent fixation, computational propaganda and graph models. As a self-proclaimed search junkie, I've been diving deep into this subject, and I'm excited to share my findings with you. Tabulare data has always seemed like a strange concept to me. The idea of taking elements of the world and assuming their value is independent of other values is tempting but ultimately reductionist. With the rapid advancements in computational power and storage techniques, it's becoming increasingly clear that we need a more representative model of the world that takes into account the relationships and connections between different elements. Over the past few months, I've been exposing myself to a variety of techniques and technologies that allow me to work with representative data in a more relational and centered way, specifically graph and network models. In this newsletter, I've compiled a selection of resources on these topics and added a touch of digital propaganda, which is another theme that I'm passionate about. I hope you enjoy this edition, and I look forward to your feedback as I continue to co-develop this newsletter. The format and structure are still a work in progress, but I'm eager to find a balance between my passions and your interests. So, let's dive in and explore the world of computational propaganda and graph models together.
     Like  Bookmark
  • Fil Rouge Project Contents ContextProject context Project objectives Implementation methodologies Planning
     Like  Bookmark