Sunday, August 29, 2021

X-ray StoneAssemblies.MassAuth with NDepend

Introduction

A long time ago, I wrote this post Why should you start using NDepend? which I consider as the best post I have ever written or almost ;)

NDepend is a static analysis tool for .NET, and helps us to analyze code without executing it, and is generally used to ensure conformance with the coding guidelines. As its authors use to say, it will likely find hundreds or even thousands of issues affecting your codebase.

After the first pre-releases of StoneAssemblies.MassAuth I decided to X-ray it with NDepend. I attach a new NDepend project to StoneAssemblies.MassAuth solution, filter out the test and demo assemblies, and hit the analyze button.

Attach new NDepend Project to Visual Studio Solution 

Now, I invite you to interpret some results with me. So, let's start.

Interpreting the NDepend analysis report

One of the outputs of the analysis is a web report. The main page includes a report summary with Diagrams, Application Metrics, Quality Gates summary, and Rules summary sections.

NDepend Report Summary

Navigation Menu

It also includes a navigation menu to drill through  more detailed information including Quality Gates, Rules, Trend Charts, Metrics, Dependencies, Hot Sports, Object-Oriented Design, API Breaking Changes, Code Coverage, Dead Code, Code Diff Summary, Build Order, Abstractness vs. Instability and Analysis Log.


But let's take a look at these summary sections.

Diagrams

The diagrams section includes Dependency Graph, Dependency Matrix, Treemap Metric View and Abstractness vs. Instability. 


Dependency Graph

Dependency Matrix

Treemap Metric View (Color Metric Coverage)

So far I understand these metrics, actually, I can interpret them relatively easily. 

But, wait for a second. What is this Abstractness versus Instability? 

Abstractness versus Instability

According to the documentation the Abstractness versus Instability diagram helps to detect which assemblies are potentially painful to maintain (i.e concrete and stable) and which assemblies are potentially useless (i.e abstract and instable).

Abstractness: If an assembly contains many abstract types (i.e interfaces and abstract classes) and few concrete types, it is considered as abstract.

Instability: An assembly is considered stable if its types are used by a lot of types from other assemblies. In this context stable means painful to modify.


Component Abstractness (A) Instability (I) Distance (D)
StoneAssemblies.MassAuth 0 0.96 0.03
StoneAssemblies.MassAuth.Rules 1 0.4 0.28
StoneAssemblies.MassAuth.Messages 0.43 0.62 0.04
StoneAssemblies.MassAuth.Hosting 0.14 0.99 0.09
StoneAssemblies.MassAuth.Rules.SqlClient 0.17 1 0.12
StoneAssemblies.MassAuth.Server 0 1 0
StoneAssemblies.MassAuth.Proxy 0 1 0

None of StoneAssemblies.MassAuth's components seem potentially painful to maintain or potentially useless. But, I have to keep my eyes on StoneAssemblies.MassAuth.Rules, because of the distance (D) from the main sequence. Actually, another candidate to review is StoneAssemblies.MassAuth.Messages since ideal assemblies are either completely abstract and stable (I=0, A=1) or completely concrete and instable (I=1, A=0).


Application Metrics 

The application metrics section shows the following summary.

Application Metrics

Looks like some metrics depend on a codebase. Since this is the first I run NDepend's analysis over StoneAssemblies.MassAuth code, it hasn't noticed any difference with previous executions. But this help me to be alert about the low 66.99% value for the tests coverage, the 3 failures for quality gates, the violations of 2 critical rules and 15 high issues.


Quality Gates

In this section is possible to view more details on the failing quality gates. The percentage of coverage, the critical rules violated and the debt rating per namespace.

Quality Gates

Wait for a second again. What is debt rating per namespace means? 

According to the documentation the rule is about to forbid namespaces with a poor debt rating. By default, a value greater than 20% is considered a poor debt rating.

Namespaces Debt Ratio Issues
StoneAssemblies.MassAuth.Services 39.01 8
StoneAssemblies.MassAuth.Services.Attributes 37.62 3
StoneAssemblies.MassAuth.Server 23.67 6
StoneAssemblies.MassAuth.Rules.SqlClient .Rules 36.24 3

Rules summary

The final section is the Rules summary. It listed the issues per rules in the following table. 

Rules summary

NDepend indicates to me that I violated 2 critical rules. One to avoid namespaces mutually dependent and the other to avoid having different types with the same name. That sounds weird, but who knows, even I can make mistakes ;)


What's next?

I the near future, I will be integrating the NDepend analysis as part of the build process, therefore I could easily share with you the evolution of this library in terms of quality. If you are interested in such an experience wait for the next post.

As you already know, StoneAssemblies.MassAuth is a work in progress, which includes some unresolved technical debts. But, as I told you once, it is possible to make mistakes (critical or not), but be aware of your code quality constantly makes the difference between the apparent and intrinsic quality of your sources. If you are a dotnet developer, NDepend is a great tool to be aware of your code quality. 

Yeah, you are right, I have some work to do here in order to fix this as soon as possible but also remember, StoneAssemblies.MassAuth is also an open-source project, so you are welcome to contribute ;)

Sunday, August 22, 2021

StoneAssemblies.MassAuth Hands-on Lab

A few days ago, I introduced you StoneAssemblies.MassAuth as a  Gatekeeper implementation.

Today, as promised, I bring you a hands-on lab that consists of the following steps:

  1. Set up the workspace
  2. Contract first
  3. Implementing rules
  4. Implementing services
  5. Hosting rules
  6. Build, run and test
So, let’s do this straightforward. 

Prerequisites

  • Visual Studio 2019 (16.9.3)
  • Docker (2.3.0.4)
  • Cake (1.1.0)
  • Tye (0.9.0-alpha.21380.1)

Step 1: Set up the workspace

To set up the workspace, open a PowerShell console and run the following commands:


After executing these commands, StoneAssemblies.MassAuth.QuickStart.sln Visual Studio solution file is created, which includes the following projects:
  • StoneAssemblies.MassAuth.QuickStart.Messages:  Class library for messages specification. 
  • StoneAssemblies.MassAuth.QuickStart.Rules: Class library to implement rules for messages.
  • StoneAssemblies.MassAuth.QuickStart.ServicesWeb API to host the services that require to be authorized by rules.
  • StoneAssemblies.MassAuth.QuickStart.AuthServerAuthorization server to host the rules for messages. 
The commands also add the required NuGet packages and project references.

If you review the content of the StoneAssemblies.MassAuth.QuickStart.AuthServer.csproj project file, you should notice a package reference to StoneAssemblies.Extensibility. This is required because all rules will be provisioned as plugins for the authorization server. 

The extensibility system is NuGet based, so we need to set up the build to provision the rules and messages as NuGet packages. For that is the purpose, this workspace configuration includes two more files. The build.cake, a cake based build script to ensure the required package output,  


and the tye.yaml that will help us to run and debug the solution.
 


Step 2: Contract first

Let's add a bit of complexity to the generated problem, related to the weather forecast. For instance, let's say we will allow requesting forecasts from a certain date, as some forecasts may not be available due to the complexity of the calculations.

For that purpose, we will add the following class to the message project, to request the weather forecast with the start date as an argument.

Step 3: Implementing rules

Now we are ready to implement some rules for such a message. Continuing with our scenario, let's say the forecast data is only available from today and up to 10 days. This operation could be more complex through a query to an external database, but for simplicity, it will be implemented as follows.


Step 4: Implementing services

It's time to complete the WeatherForecastController implementation in the service project. It should look like this. 


Notice the usage of AuthorizeByRule attribute on the Get method, to indicate that the input message WeatherForecastRequestMessage must be processed and validated by the authorization engine before the method execution.

We also have to update the Startup class implementation.

  
Basically, the AddMassAuth service collection extension method is called to register the library services and also ensure communication through the message broker. Remember, StoneAssemblies.MassAuth is built on top of MassTransit.  Finally, to read the configuration via environment variables we must update the Program class to this. 

Step 5: Hosting rules

To host rules, we provide a production-ready of StoneAssemblies.MassAuth.Server as docker image available in DockerHub. But for debugging or even for customization purpose could be useful build your own rule host server. So, in the server project, we also have to update the Startup class implementation, to initialize the extensibility system and load rules. 


Again, to read the configuration via environment variables the Program file must be updated like this. 


Step 6: Build, run and test

Let's see if this works. So, cross your fingers first ;)

To build and run the project, open a PowerShell terminal in the working directory and run the following commands.


Open your browser and navigate to http://localhost:8000 to display the Tye user interface. 


You can see the logs of the rules host server to notice how extensibility works.


Let's do some weather forecast requests. For instance with a valid request

the output looks like this


but with an out of range request


the output shows an unauthorized response.



So, as expected, it works ;)


Closing

In case it doesn't work for you, you can always try to review the final and complete source code of this hands-on lab is in the StoneAssemblies.MassAuth.QuickStart repository is available on GitHub. 

Remember StoneAssemblies.MassAuth is a work in progress, we are continuously releasing new versions, so your feedback is welcome. Also, remember that it is an open-source project, so you can contribute too.

Enjoy «authorizing» with pleasure and endless possibilities.

Wednesday, July 21, 2021

Introducing StoneAssemblies MassAuth

What is Stone Assemblies in the first place?

It has been a long way before making this decision. I worked for state-owned companies for years, in fact, I still do, but now I try to see this from a different perspective. I did not keep any of the things that I built or designed in the years of work. Maybe just professional experience.

Now, I decided to start over again, by creating something more personal. Stone Assemblies is a new software development and consulting group or organization or project. Actually, if you are running a software business or starting a new one, you probably need us ;).

I cannot say that Stone Assemblies is a company or a small and medium businesses (a.k.a. PYMES). There are many regulations here in Cuba and probably I will need some legal assistance to reach that point. So, let's keep this simple. I just founded Stone Assemblies and of course, I'm the lead software engineer. That sounds great, doesn't it? 

The story of the name is quite funny, so I will probably tell you one day about it, but not today.


Some background and context

Last year I was outlining strategies to modernize the systems of an organization, implementing proofs of concepts, which would allow their legacy system to support more workload. 

This particular system is in fact a database-centric large monolithic one. There are several options to scale this kind of system. 

One of these options is vertical scaling, but they could eventually reach the same state. In fact, the database provider may have limitations to use all the resources, even those limitations could be expressed in the use license. But the main reason is that computational resources are finite.

The other option is horizontal scaling, but even when the database vendor has options for horizontal scaling, it doesn’t solve the fact that the system is actually a monolithic system with all the well-known issues of this kind of architecture. Again, this option could also have limitations expressed in the license.

Our research led us to review some well-known patterns, and we started with: 

Command and Query Responsibility Segregation (CQRS): CQRS stands for Command and Query Responsibility Segregation, a pattern that separates read and update operations for a data store. Implementing CQRS in your application can maximize its performance, scalability, and security. The flexibility created by migrating to CQRS allows a system to better evolve over time and prevents update commands from causing merge conflicts at the domain level.

In combination with a migration approach from the:

Strangler Fig: Incrementally migrate a legacy system by gradually replacing specific pieces of functionality with new applications and services. As features from the legacy system are replaced, the new system eventually replaces all of the old system's features, strangling the old system and allowing you to decommission it. 

and looking to maximize the added value for a minimum viable product (MVP), we identified one key feature of the legacy system, so, we decided to also include this one:

Gatekeeper: Protect applications and services by using a dedicated host instance that acts as a broker between clients and the application or service, validates and sanitizes requests, and passes requests and data between them. This can provide an additional layer of security, and limit the attack surface of the system.


What is Stone Assemblies MassAuth?

The key feature I mentioned earlier, is an extensive, exhaustive validation subsystem that is executed before any attempt of command execution. But all these validations - query base operations - are executed in the main single database demanding computing resources, to the detriment of the performance of the other commands that could be executing at the same time. Remember, this is a very high concurrency system.

The diagram below shows the monolithic shape of the system. 

Monolithic shape 

So, why not move all those validations to an isolated subsystem? The idea is to replicate all the slowly changing data, which is used in validation queries. With this only action, all validation workloads are executed in isolation, with the advantage that is also possible to assign dedicated resources to run these tasks. The following diagram depicts this scenario.

Validation workloads run in isolation (CQRS)

With this idea in mind, is not hard to generalize a solution that intercepts any message, and executes a preflight without impact the core system. This is the idea behind StoneAssemblies.MassAuth.

StoneAssemblies.MassAuth is a free, open-source distributed, extensible message-based authorization framework built on top of MassTransit. Actually, it allows you to improve the responsiveness and throughput, from a loosely coupled and message-driven approach. 

The following diagram shows, the result of starting to strangulate the monolithic system, segregate the responsibility of command and queries and the usage of StoneAssemblies.MassAuth.

Result of system modernization and evolution


By the way, if you didn't notice yet, StoneAssemblies.MassAuth is a Gatekeeper implementation. 

StoneAssemblies.MassAuth is also built on top of powerful and insane extensibility system StoneAssemblies.Extensibility.  So, basically, all authorization rules can be provisioned as plugins for the authorization engine, but that's part of another story. I will tell you more about this in the future.

Send us your feedback

This is a work in progress and we are continuously releasing new versions. Sources and sample codes, are available in GitHub including some Benchmarks. Packages as usual, are available in NuGet gallery and we also provide production-ready of StoneAssemblies.MassAuth.Server (a.k.a. authorization engine) as docker image available in DockerHub

Without too much to say, I just invite you to use StoneAssemblies.MassAuth and let me know what you think. You can also look forward to the next post where I will explain how to use StoneAssemblies.MassAuth from a quick start.

Remember, this is an open-source project, so, you are welcome to contribute. There are a lot of work to do, so you can contribute by creating tickets, with pull requests, or just by inviting me to a coffee ;)

Enjoy «authorizing» with pleasure.

Saturday, September 26, 2020

Stressless in the new jungle

This is my personal journey buying in TuEnvio

What is TuEnvio?

Described by CIMEX itself, TuEnvio is an “E-Commerce platform created by the CIMEX corporation for the national customer, which allows online purchases from the comfort of your home”. 

But you may wonder: Why is this new? The fact is that the expansion of Internet access in Cuba is actually a new phenomenon. From almost zero, without infrastructure, in a couple of years, Internet access for many Cubans is almost a reality. Right, it is very expensive thanks to ETECSA, but it continues to expand, which is good.

Since Cubans had no internet access, “no one” worried about selling products online. At least, not for Cubans that live in Cuba. Therefore, they “invented” a service called EnviosCuba for foreign families could buy products for their national’s relatives. A kind of favor-based business model, which is very sad. An approach to only capture foreign currency instead also think in the prosperity and comfort of Cubans that live in Cuba.

But the SARS-Cov-2 arrived. They would be forced to launch a service on a scale for which they were neither technologically nor logistically prepared. Its name TuEnvio.


The new jungle

TuEnvio looked promising. Several instances of the store, distributed in some physical stores, showed its “stock” online. Users were able to navigate, search and buy. But somewhat was not right. Buying what you needed wasn't exactly that easy. Eventually, you could catch a thing but the stress began to increase. As a vigilante, to buy a high demanded product, you had to stay up late at night.

TuEnvio doesn't have a native notification system, so I started implementing something to help me stay tuned. I was at home (remember COVID19), I was bored, but most importantly I had to buy.

That was the birth of  YourShipping.Monitor as a project. 



The first step was implementing a basic scraping system to be notified of the availability of products, including some searches by keywords. To improve the notification system, I also implemented a personal Telegram Bot, that also allows me some basic interactions.

So, the idea was to create an application similar to CamelCamelCamel with target TuEnvio. But everything would change when The new jungle arises. 

Shoot first, ask later

The best description of the situation was published in this video. A "parodied" scene from The Big Bang Theory television series. By the way, to understand what is happening you need to read the subtitles in Spanish ;). I'm not sure who the original author is. But it rocks. If you know him, please just let me know to update this post.



It turns out that shopping at TuEnvio wasn't too easy. Only a few viewed the products because they accessed them at the right time. Links leak?

On the other hand, the workload generated by the simultaneous access of thousands of people was handled by DATACIMEX's developers with an incorrect caching approach. If someone doesn't see a product at the right time, should wait for the cache to be invalidated within the next 3 minutes.

This, combined with the limited offer, meaning that the majority of TuEnvio's users were unable to purchase a thing. Worse still, they didn't even see a single product.

Under these circumstances YourShipping.Monitor's goals changed. I needed the notifications. But actually, I needed to interact with the store in light speed mode to add products to the shopping cart. 
  
I almost forget that this is also a technical post. So, here we go.

Parallel web scraping

YourShipping.Monitor is being implemented using the NetCore full stack including the frontend with Blazor. It allows me to track stores, departments, and products from its uniform resource locator (URL). The user must enter the link and a background process extracts the information and also tries to interact with the options of the store with a single rule: add a product to the cart at first sight. 

But what if I'm looking to the wrong department? What if one product is available in the very same second as another. This is why it was important to send as many requests as possible at the same time. Using the asynchronous capabilities of C# in combination with AsyncEnumerable library, I was able to do it, just like this. 



But it wasn't just me. A community of Cuban developers launched several applications to help people to buy. Even when such applications required user interaction, the workload affected the store's servers a lot. So, CIMEX responded with an anti-scraping approach.

Fighting against the anti-scraping system

One day the scraper stopped working. All requests were redirected to a page to execute this JavaScript code.



It could be easy to figure out what is happening. They expect a cookie, with a value generated in that JavaScript. I'm already using AngleSharp to explore the DOM elements. It might be possible to evaluate such a function, to acquire the value of the cookie, using the same library? The answer is yes. AngleSharp.Js is an experimental extension that allows you to run simple JavaScript functions. So, after capturing the parameters with regex, I was able to call the function to capture the cookie value as well.


Moving to unattended mode 

At this point, I was creating the session with the browser, saving the cookies.txt file, and making it available to the scraping server (a.k.a. YourShipping.Monitor.Server). The main reason, the captcha. But TuEnvio's captcha looks like this.




Actually, it doesn't look like a very hard captcha. Nothing that has not been broken before with tesseract-ocr. So, just added the reference to a .NET wrapper of tesseract and wrote down this


and you know what? It worked.

Final thoughts  

I know, this doesn't seem a bit stressful, but yeah, now it is. With YourShipping.Monitor and a bit of luck, I have been able to capture something in TuEnvio's stores. There is no guarantee, so I always insist that ETECSA should not charge for access to virtual stores. Someone can spend more money trying to buy than buying.


Recently,  CIMEX released the store's opening schedule. So now, with the effective combination of my command-line tool nauta-sessionto manage Nauta Hogar sessions, I can already go to sleep, stressless 😉.

Tuesday, January 7, 2020

Getting started with Blorc.PatternFly

Original Published on PatternFly Medium Publication


If you’re a developer who loves hands-on tactical tutorials, then read on. Today, we’re covering Blorc.PatternFly.

First off, the basics: What is Blorc.PatternFly? Standing for Blazor, Orc, and PatternFly, Blorc.PatternFly is a library with the ultimate goal of wrapping all PatternFly components and making them available as Blazor components.

Now let’s jump into a tutorial. Keep in mind that this tutorial isn’t meant as an overview of Blazor — you’ll need some basic knowledge of Blazor before diving in.

You’ll also need to have these tools handy:
  • Visual Studio 2019 (16.4.2)
  • Blazor (3.1.0-preview4.19579.2)

Step 1: Creating the project

First, go through the Get started with ASP.NET Core Blazor tutorial for Blazor WebAssembly experience. You’ll create the Blazor project in this tutorial, and you’ll only have to convert the Bootstrap to PatternFly. For the purpose of this guide, use Blorc.PatternFly.QuickStart as the project name.

Follow the on-screen instructions of the Visual Studio project:

Create a new project.

Configure your new project.

Create a new Blazor app with Blazor WebAssembly experience.

The Blazor template is built on top of Bootstrap. So the resulting app looks like this:
Index.razor and SurveyPrompt.razor
Counter.razor
FetchData.razor
From here, you’ll replace the Bootstrap look and feel with the PatternFly one.

Step 2: Startup configuration

Once the project has been created, add Blorc.PatternFly as a package reference via NuGet. At the time of writing this article (which I hope you’re enjoying!), this package is only available as prerelease. To install the latest prerelease version, check the Include prerelease option in the Package Manager.

Adding latest prerelease of Blorc.PatternFly package
Also, it’s mandatory to register the Blorc.Core services in the ConfigureServices method of the Startup class, shown below:


Once the Blorc services are registered, it’s time to start replacing the UI elements, starting with the content of the index.html and site.css files.


To make sure that no unused dependencies are being deployed, remove the bootstrap and open-iconic directories from the wwwroot/css directory.

Step 3: Updating pages and components

The time has come to update the components. You should be able to update the content of the razor files with references to the available Blorc.PatternFly components.

You can do this yourself by following the steps below or you can clone the repository with the source code of this tutorial.

For instance, the MainLayout component must inherit from PatternFlyLayoutComponentBase, and you can use the Page component as follows:

For the NavMenu, you could use the Navigation component and update the razor file as shown below:

Finally, update the content of the Counter and FetchData pages.


And that’s it! Great work. Your application should now look like the screenshots below:

Index.razor and SurveyPrompt.razor
Counter.razor

FetchData.razor

Send us your feedback

Keep in mind that the library is a work in progress, and there are still a few PatternFly components being implemented. We are continuously releasing new versions. The good news is that Blorc.PatternFly is open source, and the sources are available on GitHub.

If you would like support for any new component, contribute to the Blorc.PatternFly library on GitHub. You can get in touch by:
  • Creating tickets.
  • Contributing by pull requests.
  • Contributing via Open Collective.

Finally, if you want to see the latest develop branch of Blorc.PatternFly in action, you can browse to the live demo with a full overview of all the PatternFly components already available for Blazor. And you’ll probably agree: PatternFly and Blazor are awesome — and combined, they are a beautiful pair.

Interested in contributing an article to the PatternFly Medium publication? Great! Submit your topic idea, and we’ll be in touch.

Monday, December 16, 2019

How to avoid copying movies that you will never play?


Introduction

This is a kind of odd title for a technical post. But yes, it is a technical post. Actually, doesn't look like a real problem. But yes, it is a real one.

It turns out that I have a compulsion and obsession to watch movies. It is better to say, to copy and organize movies on my personal storage. But some of those movies will be never played.

Recently, I also noticed that I am running out of space. A well-known approach to solve this situation could be to eliminate all those movies that I never played or all that I really don't like.

But I could also try something else and take some advantage of this situation, something more productive for a Sunday afternoon because at some point I will be in the same position again.


The root of all evil

In Cuba, Internet access is very expensive. You can check the prices by yourself on the official website of only one Internet service provider (a.k.a. ETECSA). Therefore, regular Cubans don't use Netflix, nor use the Internet to download large multimedia files (at least not from home).

Such a situation has created a unique business model, that probably only works in Cuba. An offline alternative of media service provider, code name "El paquete" (the package).

I will not give you too many details about this service. All you need to know is that the package distributes a lot of movies every week via USB drives. The media content includes the latest premiers as pirates cinema copies, improved cinema copies, HD copies with Chinese subtitles, Full-HD versions, classics movies, animated movies, a specific actor's cycle, and so on. The package also includes some television programs, series, sports, contests, etc. About 1 TB per week in media files.

But my personal OCD is about movies, and I copy them all. This is not exactly a healthy approach for my very limited personal storage.

Everything gets "worse" when I meet Emby

Emby is a media server designed to organize, play, and stream audio and video to a variety of devices as you can read here. Therefore, my copy movies routine now includes the download of all movie metadata with the original title, the tag line, poster and backdrop images, the cast, community rating, critical rating, genres, all the information available from sites like IMDb or TheMovieDB that is stored in the server database and also in nfo local files next to each movie file.

These metadata enrich the user experience and are displayed when someone browses the media content from a client like Emby for Roku direct from the TV.

Spider-Man: Into the Spider-Verse (Emby for Roku)

As you can also notice in the picture above, Emby also tracks the movies that I already played. Wait a second. That looks like a perfect ground truth to be used to solve a classification problem.

Deep learning to the rescue

Sundays are good days to spend time with the family and watch movies. But, I couldn't find the right one yesterday. I'm also near to zero space for the next release of the package.

So, I just needed to try something deep ;). Something that could work as a long-term approach.

Yes, I know. I haven't written too much on this blog for a while. But remember I'm training Alexa every day, and she demands a lot of my time ;). She only left me time to publish Computing Anomaly Score Threshold with Autoencoders Pipeline and then I completely forgot to comment about it here. But that will be the subject of the next post (or the next one). So, let's go back to the movies.

The Emby server has an SQLite database (library.db).  I explored the data all around and extracted all the useful information to solve my problem with a simple join of two tables MediaItems and UserDatas.

Sample of extracted data from Emby database
At this point, I thought that was good timing to try the ML.NET Model Builder (Preview) but the extension size is about 150 MB. Too large for a Sunday at home. The .NET solution to this problem has to wait until I finish writing this post, or maybe the next weekend.

Deeplearning4j (DL4J) is already cached on my local nexus. So, here we go.

Let's do this straightforward

There is enough documentation about DL4J, even a book Deep Learning: A Practitioner's Approach. So, this will be fast. I will try don't repeat any step available online, but probably you notice some resemblance with the excellent Paul Dubs quick-start tutorial, since this, is exactly a classification problem.

Yes, if you didn't notice yet. This is a classification problem and is a quite simple one. I have to predict if I will play a movie from the following features: Official Rating, Community Rating, Critic Rating, and Genres in correlation with my own playback action.

First, I split the existing data. I created the training data set with 80% and the evaluation data set with 20% from the full data set. I stored the local analysis of the full data set to normalize each one using the same analysis.

Then I transformed the data using DataVect as follow:


Followed by this network configuration:


Finally, I set up the early stopping trainer to save the best model:


And done.


The results

Well, the results are quite impressive and also suspect. But there is no problem at all. The network perfectly isolates the movies that I already played on the evaluation data set.


Played movies from the evaluation data set.
Now, I'm ready for the next release of the package. 

Wait a second. I just remember, that I have an isolated copy of the last week's package with 58 movies in the inbox and already processed by Emby. After running the prediction program, the assistant neural network (the result of the training process) recommends that I copy only 7 movies. Yes, I can deal with that.

Prediction over the last week package

Not too bad for a Sunday, right?  But probably it requires some tuning (or watching more movies). I'm not sure that the adversary network (myself) allows ignoring Ad Astra. Or yes? ;)

Wednesday, July 11, 2018

Introducing myself into Deep Learning

Overview

It has been some time since my last blog post. Actually, it has been more than two years. The main reason is that something changed in my life when I started to train a neural network. Her name is Alexa.

Just to avoid confusion let me clarify that I am not a member of Amazon Alexa team. Alexa is the female version of my own name and it is the name of my little girl ;)

She is one of the reasons for this deep learning journey.

How did the journey begin?

Every journey has a motivation and this one is not the exception. It started on September 19 of 2016 when I carried her for the very first time. After saw her for a while I asked myself: How is possible that she can learn something in the future?

Months passed, and I saw her learning so fast effortless without too much “computational power” (apparently). She learned to walk, to dance, to almost talk, to solve a puzzle, to play basketball, to solve pyramid-piling rings puzzle, and even to scramble a Rubik’s cube ;)

Solving puzzle
 Defending vs. Elizabeth
Scrambling a Rubik's cube
Beyond the obvious answers that she will learn by design, there is a lot of trial and error in her learning process. Several attempts to look for the best fit before she can learn something. I love to see her “computing the distance” from the expected result and her attempts at solving the pyramid-piling rings puzzle by removing a wrong ring and replacing it with the right one.

Dad is trying to build something
with blocks but I'm interested
in the neighbor’s dog ;)
What really happened is that her learning process motivated me to explore something “new”. A discipline that is called to be (if it is not really is) the new toolset for every single software developer. It turns out that the new hobby came with a practitioner approach but it required to be found first.

What happened next?

At this point, I started to watch the machine learning course videos of Andrew Ng using alternatives sources. Cubans (that live in Cuba) are not able to access the certification program at Coursera. In some way, the USA embargo to Cuba – (specifically the USA exportation laws) – also affects the deep learning global democratization process.

However, it doesn't worry me too much, actually never does. I can't get the certification but I can get the knowledge. Andrew’s course is actually motivating and didactically insuperable. It was able to bring back to life some math and algebra that I thought was dead in my brain and made me felt very comfortable implementing a vectorized version of the Stochastic Gradient Descent algorithm with Octave.


After understand how this works (almost just like Andrew used to say), and what kind of problems were possible to solve, just I wanted to put this in practice at the production level. Some researchers (friends of mine) sold me Tensorflow as the Holy Grail but I had some doubts about Python performance (still have).

The new hobby comes with the
practitioner approach
After some research, I found exactly what I was looking for. A deep learning library for JVM.  Deeplearnig4j (DL4j) is an excellent library and comes with this excellent book Deep Learning: A Practitioner's Approach from Adam Gibson and Josh Patterson. I just needed to read the preface, to identify me as a deep learning practitioner. It wasn't to hard notice that was the right book for me. I'm pretty sure that is also the right book for you.

DL4j also comes with a lot of helpful features and tools to assist the training process including Training UI, datavec, early stopping, even GPU support, and more, but we can talk about this in forthcoming posts.
DL4J Performance - examples/sec on Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz (No GPU required in this case)
Nice normal distribution shape for weights in the Layer Parameters Histogram (So, no regularization issues)
Some network details
Recently, I also started to work in build a proof of concept of an anomaly detection system built on top of DL4j, specifically by using Autoencoder networks with promising results but I can't give you anything in advance yet (just wait for it).

Autoencoders are supported by DL4j

Conclusions

I have a lot to learn about deep learning, but the journey has already begun and I have the intention to share it with you.

Btw, if you are not motivated to go “deep” with machine learning yet, be a father (or mother) first, and then let me know. I'm sure you will find the same "biological" inspiration when you witness how to "train a neural network" looks like ;)

Alexa talking with a large predator cat of stone

X-ray StoneAssemblies.MassAuth with NDepend

Introduction A long time ago, I wrote this post  Why should you start using NDepend?  which I consider as the best post I have ever...