Saturday, September 26, 2020

Stressless in the new jungle

This is my personal journey buying in TuEnvio

What is TuEnvio?

Described by CIMEX itself, TuEnvio is an “E-Commerce platform created by the CIMEX corporation for the national customer, which allows online purchases from the comfort of your home”. 

But you may wonder: Why is this new? The fact is that the expansion of Internet access in Cuba is actually a new phenomenon. From almost zero, without infrastructure, in a couple of years, Internet access for many Cubans is almost a reality. Right, it is very expensive thanks to ETECSA, but it continues to expand, which is good.

Since Cubans had no internet access, “no one” worried about selling products online. At least, not for Cubans that live in Cuba. Therefore, they “invented” a service called EnviosCuba for foreign families could buy products for their national’s relatives. A kind of favor-based business model, which is very sad. An approach to only capture foreign currency instead also think in the prosperity and comfort of Cubans that live in Cuba.

But the SARS-Cov-2 arrived. They would be forced to launch a service on a scale for which they were neither technologically nor logistically prepared. Its name TuEnvio.


The new jungle

TuEnvio looked promising. Several instances of the store, distributed in some physical stores, showed its “stock” online. Users were able to navigate, search and buy. But somewhat was not right. Buying what you needed wasn't exactly that easy. Eventually, you could catch a thing but the stress began to increase. As a vigilante, to buy a high demanded product, you had to stay up late at night.

TuEnvio doesn't have a native notification system, so I started implementing something to help me stay tuned. I was at home (remember COVID19), I was bored, but most importantly I had to buy.

That was the birth of  YourShipping.Monitor as a project. 



The first step was implementing a basic scraping system to be notified of the availability of products, including some searches by keywords. To improve the notification system, I also implemented a personal Telegram Bot, that also allows me some basic interactions.

So, the idea was to create an application similar to CamelCamelCamel with target TuEnvio. But everything would change when The new jungle arises. 

Shoot first, ask later

The best description of the situation was published in this video. A "parodied" scene from The Big Bang Theory television series. By the way, to understand what is happening you need to read the subtitles in Spanish ;). I'm not sure who the original author is. But it rocks. If you know him, please just let me know to update this post.



It turns out that shopping at TuEnvio wasn't too easy. Only a few viewed the products because they accessed them at the right time. Links leak?

On the other hand, the workload generated by the simultaneous access of thousands of people was handled by DATACIMEX's developers with an incorrect caching approach. If someone doesn't see a product at the right time, should wait for the cache to be invalidated within the next 3 minutes.

This, combined with the limited offer, meaning that the majority of TuEnvio's users were unable to purchase a thing. Worse still, they didn't even see a single product.

Under these circumstances YourShipping.Monitor's goals changed. I needed the notifications. But actually, I needed to interact with the store in light speed mode to add products to the shopping cart. 
  
I almost forget that this is also a technical post. So, here we go.

Parallel web scraping

YourShipping.Monitor is being implemented using the NetCore full stack including the frontend with Blazor. It allows me to track stores, departments, and products from its uniform resource locator (URL). The user must enter the link and a background process extracts the information and also tries to interact with the options of the store with a single rule: add a product to the cart at first sight. 

But what if I'm looking to the wrong department? What if one product is available in the very same second as another. This is why it was important to send as many requests as possible at the same time. Using the asynchronous capabilities of C# in combination with AsyncEnumerable library, I was able to do it, just like this. 



But it wasn't just me. A community of Cuban developers launched several applications to help people to buy. Even when such applications required user interaction, the workload affected the store's servers a lot. So, CIMEX responded with an anti-scraping approach.

Fighting against the anti-scraping system

One day the scraper stopped working. All requests were redirected to a page to execute this JavaScript code.



It could be easy to figure out what is happening. They expect a cookie, with a value generated in that JavaScript. I'm already using AngleSharp to explore the DOM elements. It might be possible to evaluate such a function, to acquire the value of the cookie, using the same library? The answer is yes. AngleSharp.Js is an experimental extension that allows you to run simple JavaScript functions. So, after capturing the parameters with regex, I was able to call the function to capture the cookie value as well.


Moving to unattended mode 

At this point, I was creating the session with the browser, saving the cookies.txt file, and making it available to the scraping server (a.k.a. YourShipping.Monitor.Server). The main reason, the captcha. But TuEnvio's captcha looks like this.




Actually, it doesn't look like a very hard captcha. Nothing that has not been broken before with tesseract-ocr. So, just added the reference to a .NET wrapper of tesseract and wrote down this


and you know what? It worked.

Final thoughts  

I know, this doesn't seem a bit stressful, but yeah, now it is. With YourShipping.Monitor and a bit of luck, I have been able to capture something in TuEnvio's stores. There is no guarantee, so I always insist that ETECSA should not charge for access to virtual stores. Someone can spend more money trying to buy than buying.


Recently,  CIMEX released the store's opening schedule. So now, with the effective combination of my command-line tool nauta-sessionto manage Nauta Hogar sessions, I can already go to sleep, stressless 😉.

6 comments:

  1. Sirvió, lo voy a postear en Linkedin y to!

    ReplyDelete
  2. So, ethics?, What You are doing, is basically a robot
    "Colero" when You Say "send as many requests as possible at the same time" that's the definition of a DDOS atack, is just not fair play for cimex and worst for all the cubans that don't have your technical expertise. Is really confortable for me, that cimex had readed this post and implemented a captcha by images,the hardest to breake. Have a little think for the others.

    ReplyDelete
    Replies
    1. In fact, I posted this because I think in others. I am in contact with DATACIMEX developers. They know what I did.

      I suggested turning TuEnvio into TuLibreta, because the problem is less technological and more logistical. You can't sell what you don't have. Less than 5% of active users of TuEnvio can purchase successfully. https://twitter.com/alexfdezsaucoCU/status/1302661299839664129

      What is your opinion of applications like ComprandoEnCuba? It is not developed by DATACIMEX and also generate a lot requests. Actually thousands of users that use such app exactly generate a DDOS attack. DATACIMEX has to deal with this.

      DATACIMEX implemented a high availability cluster for read only operations for SQLServer and EntityFramework with my help https://twitter.com/alexfdezsaucoCU/status/1296229260722569216.

      So yes, this is ethical, because it is public. Many developers do this and hide the results, or worse they are reselling the service (not me). Many people criticized me because I published this, precisely because of the expected reaction from DATACIMEX.

      I only have one instance of my program running and believe me there is no a guarantee. I'm just trying to do this stress-free, because buying in TuEnvio is totally unproductive.

      Sorry if you misunderstood the intention of my post.

      BTW, DATACIMEX already knows that I also broke up the second captcha
      https://twitter.com/alexfdezsaucoCU/status/1311423975747186689
      and yes it is public. So, sorry if comfort lasted so little.

      Believe when I said that I am not the biggest problem for DATACIMEX.

      Delete
  3. Alexánder Fernández Saúco mis respetos!!!
    Como muchos otros programadores he estado aburrido en casa durante la Covid y tambien estuve viviendo sus mismas peripecias con TuEnvio. Desarrolle un programa similar pero no llegue muy lejos...
    Leyendo su articulo y los comentarios le felicito por el trabajo realizado y q DATACIMEX vea como resuelve lo de las multiples peticiones... si mil capchas ponen apareceran mas emprendedores q los romperan...
    Yo me pregunto, DATACIMEX recibe mas peticiones q Facebook? Seguramente no, asi q a aprender como lo hacen los q saben...
    Gracias una vez mas por su aporte. Saludos desde Holguin.

    ReplyDelete

X-ray StoneAssemblies.MassAuth with NDepend

Introduction A long time ago, I wrote this post  Why should you start using NDepend?  which I consider as the best post I have ever...