HOME ABOUT ME ARTICLES

DTC Api Extension

Hey o/

As this is a short project, that is now deprecated and doesn't work anymore, this article will be succinct.

Introduction

When working on a chatbot, I started to implement API, to add more content to the bot. One of the website was DTC. DTC (short for DansTonChat), is a website that compile short funny IRC conversation, that are, due to the medium, themed around the 'geek' culture.

Unfortunately for my implementaiton, at the time, DTC didn't have any API. To remedy to this, I decide to create a custom API, that will scrap the website when asked. I prefer to note here, that I asked the authorization of Remouk, the administrator of DTC, beforehand, to be sure that this will not penalized the website.

Structure

Usually, when working on an API, we have an OAuth authentification, to prevent spamming, and custom database request, to perform a quick, and optimized way of searching for the data the user is asking. Without access directly to DTC database, one way we can categorize and order the data, is by scraping the website.

To avoid that, as it can be really heavy in request, and I didn't want to overload the DTC website, we can also parse website pages for each request. In this case, we need to replicate the initial structure of the website. The API I decide to go with, was implementing the following structure :

|- quote
    |- comment 
    |- random
|- users
    |- comments
    |- favoris

Each link would return something, for example, quote returns all information about a quote, with the id of the quote in the GET parameter.

Parsing and encoding

When receving a request about data, the website need to parse the page of DTC. To do so, we can use simple_html_dom.php by John Shclick. As we know how the website is organized, this will allow us to scrap the data of the page requested.

Then, as we have the data, parse into different variable. We can encode it in a json, to send it to the final user. We, of course, do not forget to change the header of the page we send, to be sure that it can be understand as a json file.