like was told to me

Saturday, September 26, 2020

Stressless in the new jungle

This is my personal journey buying in TuEnvio.

What is TuEnvio?

Described by CIMEX itself, TuEnvio is an “E-Commerce platform created by the CIMEX corporation for the national customer, which allows online purchases from the comfort of your home”.

But you may wonder: Why is this new? The fact is that the expansion of Internet access in Cuba is actually a new phenomenon. From almost zero, without infrastructure, in a couple of years, Internet access for many Cubans is almost a reality. Right, it is very expensive thanks to ETECSA, but it continues to expand, which is good.

Since Cubans had no internet access, “no one” worried about selling products online. At least, not for Cubans that live in Cuba. Therefore, they “invented” a service called EnviosCuba for foreign families could buy products for their national’s relatives. A kind of favor-based business model, which is very sad. An approach to only capture foreign currency instead also think in the prosperity and comfort of Cubans that live in Cuba.

But the SARS-Cov-2 arrived. They would be forced to launch a service on a scale for which they were neither technologically nor logistically prepared. Its name TuEnvio.

The new jungle

TuEnvio looked promising. Several instances of the store, distributed in some physical stores, showed its “stock” online. Users were able to navigate, search and buy. But somewhat was not right. Buying what you needed wasn't exactly that easy. Eventually, you could catch a thing but the stress began to increase. As a vigilante, to buy a high demanded product, you had to stay up late at night.

TuEnvio doesn't have a native notification system, so I started implementing something to help me stay tuned. I was at home (remember COVID19), I was bored, but most importantly I had to buy.

That was the birth of YourShipping.Monitor as a project.

The first step was implementing a basic scraping system to be notified of the availability of products, including some searches by keywords. To improve the notification system, I also implemented a personal Telegram Bot, that also allows me some basic interactions.

@cimex_cuba #TuEnvio #Telegram #Bot #Demo 😉

/search agua pic.twitter.com/3fTR2z3hIp
— Alexánder Fernández (@alexfdezsaucoCU) May 31, 2020

So, the idea was to create an application similar to CamelCamelCamel with target TuEnvio. But everything would change when The new jungle arises.

Shoot first, ask later

The best description of the situation was published in this video. A "parodied" scene from The Big Bang Theory television series. By the way, to understand what is happening you need to read the subtitles in Spanish ;). I'm not sure who the original author is. But it rocks. If you know him, please just let me know to update this post.

It turns out that shopping at TuEnvio wasn't too easy. Only a few viewed the products because they accessed them at the right time. Links leak?

On the other hand, the workload generated by the simultaneous access of thousands of people was handled by DATACIMEX's developers with an incorrect caching approach. If someone doesn't see a product at the right time, should wait for the cache to be invalidated within the next 3 minutes.

This, combined with the limited offer, meaning that the majority of TuEnvio's users were unable to purchase a thing. Worse still, they didn't even see a single product.

Under these circumstances YourShipping.Monitor's goals changed. I needed the notifications. But actually, I needed to interact with the store in light speed mode to add products to the shopping cart.

I almost forget that this is also a technical post. So, here we go.

Parallel web scraping

YourShipping.Monitor is being implemented using the NetCore full stack including the frontend with Blazor. It allows me to track stores, departments, and products from its uniform resource locator (URL). The user must enter the link and a background process extracts the information and also tries to interact with the options of the store with a single rule: add a product to the cart at first sight.

But what if I'm looking to the wrong department? What if one product is available in the very same second as another. This is why it was important to send as many requests as possible at the same time. Using the asynchronous capabilities of C# in combination with AsyncEnumerable library, I was able to do it, just like this.

But it wasn't just me. A community of Cuban developers launched several applications to help people to buy. Even when such applications required user interaction, the workload affected the store's servers a lot. So, CIMEX responded with an anti-scraping approach.

Fighting against the anti-scraping system

One day the scraper stopped working. All requests were redirected to a page to execute this JavaScript code.

It could be easy to figure out what is happening. They expect a cookie, with a value generated in that JavaScript. I'm already using AngleSharp to explore the DOM elements. It might be possible to evaluate such a function, to acquire the value of the cookie, using the same library? The answer is yes. AngleSharp.Js is an experimental extension that allows you to run simple JavaScript functions. So, after capturing the parameters with regex, I was able to call the function to capture the cookie value as well.

Moving to unattended mode

At this point, I was creating the session with the browser, saving the cookies.txt file, and making it available to the scraping server (a.k.a. YourShipping.Monitor.Server). The main reason, the captcha. But TuEnvio's captcha looks like this.

Actually, it doesn't look like a very hard captcha. Nothing that has not been broken before with tesseract-ocr. So, just added the reference to a .NET wrapper of tesseract and wrote down this

and you know what? It worked.

Final thoughts

I know, this doesn't seem a bit stressful, but yeah, now it is. With YourShipping.Monitor and a bit of luck, I have been able to capture something in TuEnvio's stores. There is no guarantee, so I always insist that ETECSA should not charge for access to virtual stores. Someone can spend more money trying to buy than buying.

Recently, CIMEX released the store's opening schedule. So now, with the effective combination of my command-line tool nauta-session, to manage Nauta Hogar sessions, I can already go to sleep, stressless 😉.

Me cuadró. Ya me puedo acostar a dormir 😉. pic.twitter.com/Xe1UwFYnPC
— Alexánder Fernández (@alexfdezsaucoCU) September 25, 2020

Tuesday, January 7, 2020

Getting started with Blorc.PatternFly

Original Published on PatternFly Medium Publication

If you’re a developer who loves hands-on tactical tutorials, then read on. Today, we’re covering Blorc.PatternFly.

First off, the basics: What is Blorc.PatternFly? Standing for Blazor, Orc, and PatternFly, Blorc.PatternFly is a library with the ultimate goal of wrapping all PatternFly components and making them available as Blazor components.

Now let’s jump into a tutorial. Keep in mind that this tutorial isn’t meant as an overview of Blazor — you’ll need some basic knowledge of Blazor before diving in.

You’ll also need to have these tools handy:

Visual Studio 2019 (16.4.2)
Blazor (3.1.0-preview4.19579.2)

Step 1: Creating the project

First, go through the Get started with ASP.NET Core Blazor tutorial for Blazor WebAssembly experience. You’ll create the Blazor project in this tutorial, and you’ll only have to convert the Bootstrap to PatternFly. For the purpose of this guide, use Blorc.PatternFly.QuickStart as the project name.

Follow the on-screen instructions of the Visual Studio project:

Create a new project.

Configure your new project.

Create a new Blazor app with Blazor WebAssembly experience.

The Blazor template is built on top of Bootstrap. So the resulting app looks like this:

Index.razor and SurveyPrompt.razor

Counter.razor

FetchData.razor

From here, you’ll replace the Bootstrap look and feel with the PatternFly one.

Step 2: Startup configuration

Once the project has been created, add Blorc.PatternFly as a package reference via NuGet. At the time of writing this article (which I hope you’re enjoying!), this package is only available as prerelease. To install the latest prerelease version, check the Include prerelease option in the Package Manager.

Adding latest prerelease of Blorc.PatternFly package

Also, it’s mandatory to register the Blorc.Core services in the ConfigureServices method of the Startup class, shown below:

Once the Blorc services are registered, it’s time to start replacing the UI elements, starting with the content of the index.html and site.css files.

To make sure that no unused dependencies are being deployed, remove the bootstrap and open-iconic directories from the wwwroot/css directory.

Step 3: Updating pages and components

The time has come to update the components. You should be able to update the content of the razor files with references to the available Blorc.PatternFly components.

You can do this yourself by following the steps below or you can clone the repository with the source code of this tutorial.

For instance, the MainLayout component must inherit from PatternFlyLayoutComponentBase, and you can use the Page component as follows:

For the NavMenu, you could use the Navigation component and update the razor file as shown below:

Finally, update the content of the Counter and FetchData pages.

And that’s it! Great work. Your application should now look like the screenshots below:

Index.razor and SurveyPrompt.razor

Counter.razor

FetchData.razor

Send us your feedback

Keep in mind that the library is a work in progress, and there are still a few PatternFly components being implemented. We are continuously releasing new versions. The good news is that Blorc.PatternFly is open source, and the sources are available on GitHub.

If you would like support for any new component, contribute to the Blorc.PatternFly library on GitHub. You can get in touch by:

Creating tickets.
Contributing by pull requests.
Contributing via Open Collective.

Finally, if you want to see the latest develop branch of Blorc.PatternFly in action, you can browse to the live demo with a full overview of all the PatternFly components already available for Blazor. And you’ll probably agree: PatternFly and Blazor are awesome — and combined, they are a beautiful pair.

Interested in contributing an article to the PatternFly Medium publication? Great! Submit your topic idea, and we’ll be in touch.

Monday, December 16, 2019

How to avoid copying movies that you will never play?

Introduction

This is a kind of odd title for a technical post. But yes, it is a technical post. Actually, doesn't look like a real problem. But yes, it is a real one.

It turns out that I have a compulsion and obsession to watch movies. It is better to say, to copy and organize movies on my personal storage. But some of those movies will be never played.

Recently, I also noticed that I am running out of space. A well-known approach to solve this situation could be to eliminate all those movies that I never played or all that I really don't like.

But I could also try something else and take some advantage of this situation, something more productive for a Sunday afternoon because at some point I will be in the same position again.

The root of all evil

In Cuba, Internet access is very expensive. You can check the prices by yourself on the official website of only one Internet service provider (a.k.a. ETECSA). Therefore, regular Cubans don't use Netflix, nor use the Internet to download large multimedia files (at least not from home).

Such a situation has created a unique business model, that probably only works in Cuba. An offline alternative of media service provider, code name "El paquete" (the package).

I will not give you too many details about this service. All you need to know is that the package distributes a lot of movies every week via USB drives. The media content includes the latest premiers as pirates cinema copies, improved cinema copies, HD copies with Chinese subtitles, Full-HD versions, classics movies, animated movies, a specific actor's cycle, and so on. The package also includes some television programs, series, sports, contests, etc. About 1 TB per week in media files.

But my personal OCD is about movies, and I copy them all. This is not exactly a healthy approach for my very limited personal storage.

Everything gets "worse" when I meet Emby

Emby is a media server designed to organize, play, and stream audio and video to a variety of devices as you can read here. Therefore, my copy movies routine now includes the download of all movie metadata with the original title, the tag line, poster and backdrop images, the cast, community rating, critical rating, genres, all the information available from sites like IMDb or TheMovieDB that is stored in the server database and also in nfo local files next to each movie file.

These metadata enrich the user experience and are displayed when someone browses the media content from a client like Emby for Roku direct from the TV.

Spider-Man: Into the Spider-Verse (Emby for Roku)

As you can also notice in the picture above, Emby also tracks the movies that I already played. Wait a second. That looks like a perfect ground truth to be used to solve a classification problem.

Deep learning to the rescue

Sundays are good days to spend time with the family and watch movies. But, I couldn't find the right one yesterday. I'm also near to zero space for the next release of the package.

So, I just needed to try something deep ;). Something that could work as a long-term approach.

Yes, I know. I haven't written too much on this blog for a while. But remember I'm training Alexa every day, and she demands a lot of my time ;). She only left me time to publish Computing Anomaly Score Threshold with Autoencoders Pipeline and then I completely forgot to comment about it here. But that will be the subject of the next post (or the next one). So, let's go back to the movies.

The Emby server has an SQLite database (library.db). I explored the data all around and extracted all the useful information to solve my problem with a simple join of two tables MediaItems and UserDatas.

Sample of extracted data from Emby database

At this point, I thought that was good timing to try the ML.NET Model Builder (Preview) but the extension size is about 150 MB. Too large for a Sunday at home. The .NET solution to this problem has to wait until I finish writing this post, or maybe the next weekend.

Deeplearning4j (DL4J) is already cached on my local nexus. So, here we go.

Let's do this straightforward

There is enough documentation about DL4J, even a book Deep Learning: A Practitioner's Approach. So, this will be fast. I will try don't repeat any step available online, but probably you notice some resemblance with the excellent Paul Dubs quick-start tutorial, since this, is exactly a classification problem.

Yes, if you didn't notice yet. This is a classification problem and is a quite simple one. I have to predict if I will play a movie from the following features: Official Rating, Community Rating, Critic Rating, and Genres in correlation with my own playback action.

First, I split the existing data. I created the training data set with 80% and the evaluation data set with 20% from the full data set. I stored the local analysis of the full data set to normalize each one using the same analysis.

Then I transformed the data using DataVect as follow:

Followed by this network configuration:

Finally, I set up the early stopping trainer to save the best model:

And done.

The results

Well, the results are quite impressive and also suspect. But there is no problem at all. The network perfectly isolates the movies that I already played on the evaluation data set.

Played movies from the evaluation data set.

Now, I'm ready for the next release of the package.

Wait a second. I just remember, that I have an isolated copy of the last week's package with 58 movies in the inbox and already processed by Emby. After running the prediction program, the assistant neural network (the result of the training process) recommends that I copy only 7 movies. Yes, I can deal with that.

Prediction over the last week package

Not too bad for a Sunday, right? But probably it requires some tuning (or watching more movies). I'm not sure that the adversary network (myself) allows ignoring Ad Astra. Or yes? ;)