https://github.com/Gardener-gg/email-verifier Create a connection with the SMTP sever for the given email (Provided the email is in valid format). Send HELO to initialize the conversation with the server. Send MAIL FROM to indicate the sender's address. Send RCPT TO with the email address to verify. Determine the deliverablity of the email based on the status of RCPT TO. Close the connection. # Single email verification Request: ```json= { "to_email": "someone@gmail.com", "from_email": "my@my-server.com", // (optional) email to use in the `FROM` SMTP command, defaults to "user@example.org" "hello_name": "my-server.com", // (optional) name to use in the `EHLO` SMTP command, defaults to "localhost" "proxy": { // (optional) SOCK5 proxy to run the verification through, default is empty "host": "my-proxy.io", "port": 1080 }, "smtp_port": 587 // (optional) SMTP port to do the email verification, defaults to 25 } ``` Response: ```json= { "input": "someone@gmail.com", "is_reachable": "invalid", // "misc": { // "is_disposable": false, // "is_role_account": false // }, "mx": { "accepts_mail": true, "records": [ "alt3.gmail-smtp-in.l.google.com.", "gmail-smtp-in.l.google.com.", "alt1.gmail-smtp-in.l.google.com.", "alt4.gmail-smtp-in.l.google.com.", "alt2.gmail-smtp-in.l.google.com." ] }, "smtp": { "can_connect_smtp": true, "has_full_inbox": false, "is_catch_all": false, "is_deliverable": false, "is_disabled": true }, "syntax": { "domain": "gmail.com", "is_valid_syntax": true, "username": "someone" } } ``` # Bulk Email Verification Specification ## Pipeline The idea is put email lists in a message queue. ![pipeline](https://files.readme.io/2ac4d24-verification_pipeline.png) ## API Endpoints Specification This section lists all the API endpoints to implement for bulk email verification feature. Inspiration: - https://developers.neverbounce.com/docs/verifying-a-list - https://documentation.mailgun.com/en/latest/api-email-validation.html#bulk-validation - https://emailverification.whoisxmlapi.com/bulk-api/documentation/get-results-invalid-and-failed-emails ### 1. `POST /v0/bulk` Create a new job for verifying an email list. This endpoint can receive input in multiple ways. The `input_type` parameter describes the contents of the `input` parameter. `input_type` can be one of the following values: - `"remote_url"` - `"array"` #### Remote URL Using a remote URL allows you to host the file and provide us with a direct link to it. The file should be a list of emails separated by line breaks or a standard CSV file. We support most common file transfer protocols and their authentication mechanisms. When using a URL that requires authentication be sure to pass the username and password in the URI string. Request: ```json= { "input_type": "remote_url", "input": "https://mydomain.com/my_file.csv", "porxy": [ "111.111.111.111,0000,somename@domain.com,hostname.com",// (optional) proxy ip , from name , from helo "222.222.222.222,0000,somename@domain.com,hostname.com" //(optional) proxy ip , from name , from helo ], "smtp_port": 587 // (optional) SMTP port to do the email verification, defaults to 25 } ``` Contents of `my_file.csv`: ```csv support@domain.com invalid@domain.com ... ``` Response: ```json= { "job_id": 150970 } ``` #### Email Array Supplying the data directly gives you the option to dynamically create email lists on the fly rather than having to write to a file. `input` will accept an array of strings, each item representing an email to verify,. Request: ```json= { "input_type": "array", "input": [ "support@domain.com", "invalid@domain.com" ], "porxy": [ "111.111.111.111,0000,somename@domain.com,hostname.com",// (optional) proxy ip , from name , from helo "222.222.222.222,0000,somename@domain.com,hostname.com" //(optional) proxy ip , from name , from helo ], "smtp_port": 587 // (optional) SMTP port to do the email verification, defaults to 25 } ``` Response: ```json= { "job_id": 150970 } ``` ### 2. `GET /v0/bulk/<job_id>` Get the status of a job, as well current progress and a summary. Response: ```json= "job_id": 150970, // auto-increment "created_at": "2017-04-15T20:00:06:00.000Z", // ISO 8601 "finished_at": "2017-04-15T21:52:46:00.000Z",// undefined if job_status != "complete" "total_records": 24606, "total_processed": 24606, "summary": { "total_safe": 18227, "total_invalid": 1305, "total_risky": 4342, "total_unknown": 716 }, "job_status": "complete" // values: ["running", "complete"] ``` ### 3. `GET /v0/bulk/<job_id>/download` This endpoint gets the results of a job. It returns an error if the job status is not `complete`. Response: This endpoint returns an `application/octet-stream` containing the job data as a CSV file. The CSV file looks like: ```csv= input,is_reachable,mx.accepts_mail,mx.records,smtp.can_connect_smtp,smtp.has_full_inbox,smtp.is_catch_all,smtp.is_deliverable,smtp.is_disabled,syntax.domain,syntax.is_valid_syntax,syntax.username someone@gmail.com,invalid,true,"alt3.gmail-smtp-in.l.google.com.,gmail-smtp-in.l.google.com.",true,false,false,false,true,gmail.com,true,someone ``` Note how the `mx.records` string array field is encoded as ``"<mx1>,<mx2>,<...>"``. ### 4. Errors In case of errors on any endpoint, the JSON to be returned is as follows. Response: ```json= { "error": "..." // Error string } ``` The response should also contain the correct HTTP status code. ## Technical Implementation The implementation MUST be done on the Github repository, as a Pull Request ### Message Queue The implementation of the bulk verification feature uses message queues. The queue consists of 2 processes: - the `web` process listens to incoming HTTP requests from end users. The API endpoints described in [API Endpoints Specification](#api-endpoints-specification) should be added to enable creating jobs. - the `worker` process does the actual email verification. The preferred backend for message queue is PostgreSQL. Recent PQSQL versions make implementing queues much easier: - `SKIP LOCKED` - `LISTEN`/`NOTIFY` Email verification results will also be stored in the SQL database, so that users can revisit the results a long time after the job finishes. ### Language and Libraries As for libraries, there is flexibility. Some ideas: - custom in-house implementation of message queue - https://python-rq.org/ - https://huey.readthedocs.io/en/latest/ ## Q&A This section describes Quality and Assurance of the bulk email verification feature. ### Create a successful job with remote_url Request: ```json= POST /v0/bulk { "input_type": "remote_url", "input": "https://gist.githubusercontent.com/amaurym/e6a4109a2b2c9baa42806f6953b12fb3/raw/261bf800f052d63bb98d0973486f61625d316db7/remote_qa.csv" } ``` Expected Response: ```json= { "job_id": 1 } ``` ### Query job status Request: ```json= GET /v0/bulk/1 ``` Expected Response: When job is still running: ```json= { "job_id": 150970, "created_at": "2017-04-15T20:00:06:00.000Z", "finished_at": undefined, "total_records": 3, "total_processed": 2, "summary": { "total_safe": 0, "total_invalid": 1, "total_risky": 1, "total_unknown": 0 }, "job_status": "running" } ``` When job is complete: ```json= { "job_id": 150970, "created_at": "2017-04-15T20:00:06:00.000Z", "finished_at": "2017-04-15T20:00:06:00.000Z", "total_records": 3, "total_processed": 3, "summary": { "total_safe": 0, "total_invalid": 1, "total_risky": 2, "total_unknown": 0 }, "job_status": "complete" } ``` ### Download Job Results Request: ```json= GET /v0/bulk/1/download ``` Expected response: https://gist.github.com_bulk_qa_results-csv ## Further Improvements This improvements are to be done in future iterations of this feature, in separate Pull Requests. - `DELETE /v0/bulk/<job_ib>`: cancel a job. - Prune SQL db with old email results