like was told to me

Saturday, September 26, 2020

Stressless in the new jungle

This is my personal journey buying in TuEnvio.

What is TuEnvio?

Described by CIMEX itself, TuEnvio is an “E-Commerce platform created by the CIMEX corporation for the national customer, which allows online purchases from the comfort of your home”.

But you may wonder: Why is this new? The fact is that the expansion of Internet access in Cuba is actually a new phenomenon. From almost zero, without infrastructure, in a couple of years, Internet access for many Cubans is almost a reality. Right, it is very expensive thanks to ETECSA, but it continues to expand, which is good.

Since Cubans had no internet access, “no one” worried about selling products online. At least, not for Cubans that live in Cuba. Therefore, they “invented” a service called EnviosCuba for foreign families could buy products for their national’s relatives. A kind of favor-based business model, which is very sad. An approach to only capture foreign currency instead also think in the prosperity and comfort of Cubans that live in Cuba.

But the SARS-Cov-2 arrived. They would be forced to launch a service on a scale for which they were neither technologically nor logistically prepared. Its name TuEnvio.

The new jungle

TuEnvio looked promising. Several instances of the store, distributed in some physical stores, showed its “stock” online. Users were able to navigate, search and buy. But somewhat was not right. Buying what you needed wasn't exactly that easy. Eventually, you could catch a thing but the stress began to increase. As a vigilante, to buy a high demanded product, you had to stay up late at night.

TuEnvio doesn't have a native notification system, so I started implementing something to help me stay tuned. I was at home (remember COVID19), I was bored, but most importantly I had to buy.

That was the birth of YourShipping.Monitor as a project.

The first step was implementing a basic scraping system to be notified of the availability of products, including some searches by keywords. To improve the notification system, I also implemented a personal Telegram Bot, that also allows me some basic interactions.

@cimex_cuba #TuEnvio #Telegram #Bot #Demo 😉

/search agua pic.twitter.com/3fTR2z3hIp
— Alexánder Fernández (@alexfdezsaucoCU) May 31, 2020

So, the idea was to create an application similar to CamelCamelCamel with target TuEnvio. But everything would change when The new jungle arises.

Shoot first, ask later

The best description of the situation was published in this video. A "parodied" scene from The Big Bang Theory television series. By the way, to understand what is happening you need to read the subtitles in Spanish ;). I'm not sure who the original author is. But it rocks. If you know him, please just let me know to update this post.

It turns out that shopping at TuEnvio wasn't too easy. Only a few viewed the products because they accessed them at the right time. Links leak?

On the other hand, the workload generated by the simultaneous access of thousands of people was handled by DATACIMEX's developers with an incorrect caching approach. If someone doesn't see a product at the right time, should wait for the cache to be invalidated within the next 3 minutes.

This, combined with the limited offer, meaning that the majority of TuEnvio's users were unable to purchase a thing. Worse still, they didn't even see a single product.

Under these circumstances YourShipping.Monitor's goals changed. I needed the notifications. But actually, I needed to interact with the store in light speed mode to add products to the shopping cart.

I almost forget that this is also a technical post. So, here we go.

Parallel web scraping

YourShipping.Monitor is being implemented using the NetCore full stack including the frontend with Blazor. It allows me to track stores, departments, and products from its uniform resource locator (URL). The user must enter the link and a background process extracts the information and also tries to interact with the options of the store with a single rule: add a product to the cart at first sight.

But what if I'm looking to the wrong department? What if one product is available in the very same second as another. This is why it was important to send as many requests as possible at the same time. Using the asynchronous capabilities of C# in combination with AsyncEnumerable library, I was able to do it, just like this.

But it wasn't just me. A community of Cuban developers launched several applications to help people to buy. Even when such applications required user interaction, the workload affected the store's servers a lot. So, CIMEX responded with an anti-scraping approach.

Fighting against the anti-scraping system

One day the scraper stopped working. All requests were redirected to a page to execute this JavaScript code.

It could be easy to figure out what is happening. They expect a cookie, with a value generated in that JavaScript. I'm already using AngleSharp to explore the DOM elements. It might be possible to evaluate such a function, to acquire the value of the cookie, using the same library? The answer is yes. AngleSharp.Js is an experimental extension that allows you to run simple JavaScript functions. So, after capturing the parameters with regex, I was able to call the function to capture the cookie value as well.

Moving to unattended mode

At this point, I was creating the session with the browser, saving the cookies.txt file, and making it available to the scraping server (a.k.a. YourShipping.Monitor.Server). The main reason, the captcha. But TuEnvio's captcha looks like this.

Actually, it doesn't look like a very hard captcha. Nothing that has not been broken before with tesseract-ocr. So, just added the reference to a .NET wrapper of tesseract and wrote down this

and you know what? It worked.

Final thoughts

I know, this doesn't seem a bit stressful, but yeah, now it is. With YourShipping.Monitor and a bit of luck, I have been able to capture something in TuEnvio's stores. There is no guarantee, so I always insist that ETECSA should not charge for access to virtual stores. Someone can spend more money trying to buy than buying.

Recently, CIMEX released the store's opening schedule. So now, with the effective combination of my command-line tool nauta-session, to manage Nauta Hogar sessions, I can already go to sleep, stressless 😉.

Me cuadró. Ya me puedo acostar a dormir 😉. pic.twitter.com/Xe1UwFYnPC
— Alexánder Fernández (@alexfdezsaucoCU) September 25, 2020

Tuesday, January 7, 2020

Getting started with Blorc.PatternFly

Original Published on PatternFly Medium Publication

If you’re a developer who loves hands-on tactical tutorials, then read on. Today, we’re covering Blorc.PatternFly.

First off, the basics: What is Blorc.PatternFly? Standing for Blazor, Orc, and PatternFly, Blorc.PatternFly is a library with the ultimate goal of wrapping all PatternFly components and making them available as Blazor components.

Now let’s jump into a tutorial. Keep in mind that this tutorial isn’t meant as an overview of Blazor — you’ll need some basic knowledge of Blazor before diving in.

You’ll also need to have these tools handy:

Visual Studio 2019 (16.4.2)
Blazor (3.1.0-preview4.19579.2)

Step 1: Creating the project

First, go through the Get started with ASP.NET Core Blazor tutorial for Blazor WebAssembly experience. You’ll create the Blazor project in this tutorial, and you’ll only have to convert the Bootstrap to PatternFly. For the purpose of this guide, use Blorc.PatternFly.QuickStart as the project name.

Follow the on-screen instructions of the Visual Studio project:

Create a new project.

Configure your new project.

Create a new Blazor app with Blazor WebAssembly experience.

The Blazor template is built on top of Bootstrap. So the resulting app looks like this:

Index.razor and SurveyPrompt.razor

Counter.razor

FetchData.razor

From here, you’ll replace the Bootstrap look and feel with the PatternFly one.

Step 2: Startup configuration

Once the project has been created, add Blorc.PatternFly as a package reference via NuGet. At the time of writing this article (which I hope you’re enjoying!), this package is only available as prerelease. To install the latest prerelease version, check the Include prerelease option in the Package Manager.

Adding latest prerelease of Blorc.PatternFly package

Also, it’s mandatory to register the Blorc.Core services in the ConfigureServices method of the Startup class, shown below:

Once the Blorc services are registered, it’s time to start replacing the UI elements, starting with the content of the index.html and site.css files.

To make sure that no unused dependencies are being deployed, remove the bootstrap and open-iconic directories from the wwwroot/css directory.

Step 3: Updating pages and components

The time has come to update the components. You should be able to update the content of the razor files with references to the available Blorc.PatternFly components.

You can do this yourself by following the steps below or you can clone the repository with the source code of this tutorial.

For instance, the MainLayout component must inherit from PatternFlyLayoutComponentBase, and you can use the Page component as follows:

For the NavMenu, you could use the Navigation component and update the razor file as shown below:

Finally, update the content of the Counter and FetchData pages.

And that’s it! Great work. Your application should now look like the screenshots below:

Index.razor and SurveyPrompt.razor

Counter.razor

FetchData.razor

Send us your feedback

Keep in mind that the library is a work in progress, and there are still a few PatternFly components being implemented. We are continuously releasing new versions. The good news is that Blorc.PatternFly is open source, and the sources are available on GitHub.

If you would like support for any new component, contribute to the Blorc.PatternFly library on GitHub. You can get in touch by:

Creating tickets.
Contributing by pull requests.
Contributing via Open Collective.

Finally, if you want to see the latest develop branch of Blorc.PatternFly in action, you can browse to the live demo with a full overview of all the PatternFly components already available for Blazor. And you’ll probably agree: PatternFly and Blazor are awesome — and combined, they are a beautiful pair.

Interested in contributing an article to the PatternFly Medium publication? Great! Submit your topic idea, and we’ll be in touch.

Monday, December 16, 2019

How to avoid copying movies that you will never play?

Introduction

This is a kind of odd title for a technical post. But yes, it is a technical post. Actually, doesn't look like a real problem. But yes, it is a real one.

It turns out that I have a compulsion and obsession to watch movies. It is better to say, to copy and organize movies on my personal storage. But some of those movies will be never played.

Recently, I also noticed that I am running out of space. A well-known approach to solve this situation could be to eliminate all those movies that I never played or all that I really don't like.

But I could also try something else and take some advantage of this situation, something more productive for a Sunday afternoon because at some point I will be in the same position again.

The root of all evil

In Cuba, Internet access is very expensive. You can check the prices by yourself on the official website of only one Internet service provider (a.k.a. ETECSA). Therefore, regular Cubans don't use Netflix, nor use the Internet to download large multimedia files (at least not from home).

Such a situation has created a unique business model, that probably only works in Cuba. An offline alternative of media service provider, code name "El paquete" (the package).

I will not give you too many details about this service. All you need to know is that the package distributes a lot of movies every week via USB drives. The media content includes the latest premiers as pirates cinema copies, improved cinema copies, HD copies with Chinese subtitles, Full-HD versions, classics movies, animated movies, a specific actor's cycle, and so on. The package also includes some television programs, series, sports, contests, etc. About 1 TB per week in media files.

But my personal OCD is about movies, and I copy them all. This is not exactly a healthy approach for my very limited personal storage.

Everything gets "worse" when I meet Emby

Emby is a media server designed to organize, play, and stream audio and video to a variety of devices as you can read here. Therefore, my copy movies routine now includes the download of all movie metadata with the original title, the tag line, poster and backdrop images, the cast, community rating, critical rating, genres, all the information available from sites like IMDb or TheMovieDB that is stored in the server database and also in nfo local files next to each movie file.

These metadata enrich the user experience and are displayed when someone browses the media content from a client like Emby for Roku direct from the TV.

Spider-Man: Into the Spider-Verse (Emby for Roku)

As you can also notice in the picture above, Emby also tracks the movies that I already played. Wait a second. That looks like a perfect ground truth to be used to solve a classification problem.

Deep learning to the rescue

Sundays are good days to spend time with the family and watch movies. But, I couldn't find the right one yesterday. I'm also near to zero space for the next release of the package.

So, I just needed to try something deep ;). Something that could work as a long-term approach.

Yes, I know. I haven't written too much on this blog for a while. But remember I'm training Alexa every day, and she demands a lot of my time ;). She only left me time to publish Computing Anomaly Score Threshold with Autoencoders Pipeline and then I completely forgot to comment about it here. But that will be the subject of the next post (or the next one). So, let's go back to the movies.

The Emby server has an SQLite database (library.db). I explored the data all around and extracted all the useful information to solve my problem with a simple join of two tables MediaItems and UserDatas.

Sample of extracted data from Emby database

At this point, I thought that was good timing to try the ML.NET Model Builder (Preview) but the extension size is about 150 MB. Too large for a Sunday at home. The .NET solution to this problem has to wait until I finish writing this post, or maybe the next weekend.

Deeplearning4j (DL4J) is already cached on my local nexus. So, here we go.

Let's do this straightforward

There is enough documentation about DL4J, even a book Deep Learning: A Practitioner's Approach. So, this will be fast. I will try don't repeat any step available online, but probably you notice some resemblance with the excellent Paul Dubs quick-start tutorial, since this, is exactly a classification problem.

Yes, if you didn't notice yet. This is a classification problem and is a quite simple one. I have to predict if I will play a movie from the following features: Official Rating, Community Rating, Critic Rating, and Genres in correlation with my own playback action.

First, I split the existing data. I created the training data set with 80% and the evaluation data set with 20% from the full data set. I stored the local analysis of the full data set to normalize each one using the same analysis.

Then I transformed the data using DataVect as follow:

Followed by this network configuration:

Finally, I set up the early stopping trainer to save the best model:

And done.

The results

Well, the results are quite impressive and also suspect. But there is no problem at all. The network perfectly isolates the movies that I already played on the evaluation data set.

Played movies from the evaluation data set.

Now, I'm ready for the next release of the package.

Wait a second. I just remember, that I have an isolated copy of the last week's package with 58 movies in the inbox and already processed by Emby. After running the prediction program, the assistant neural network (the result of the training process) recommends that I copy only 7 movies. Yes, I can deal with that.

Prediction over the last week package

Not too bad for a Sunday, right? But probably it requires some tuning (or watching more movies). I'm not sure that the adversary network (myself) allows ignoring Ad Astra. Or yes? ;)

Wednesday, July 11, 2018

Introducing myself into Deep Learning

Overview

It has been some time since my last blog post. Actually, it has been more than two years. The main reason is that something changed in my life when I started to train a neural network. Her name is Alexa.

Just to avoid confusion let me clarify that I am not a member of Amazon Alexa team. Alexa is the female version of my own name and it is the name of my little girl ;)

She is one of the reasons for this deep learning journey.

How did the journey begin?

Every journey has a motivation and this one is not the exception. It started on September 19 of 2016 when I carried her for the very first time. After saw her for a while I asked myself: How is possible that she can learn something in the future?

Months passed, and I saw her learning so fast effortless without too much “computational power” (apparently). She learned to walk, to dance, to almost talk, to solve a puzzle, to play basketball, to solve pyramid-piling rings puzzle, and even to scramble a Rubik’s cube ;)

Solving puzzle

Defending vs. Elizabeth

Scrambling a Rubik's cube

Beyond the obvious answers that she will learn by design, there is a lot of trial and error in her learning process. Several attempts to look for the best fit before she can learn something. I love to see her “computing the distance” from the expected result and her attempts at solving the pyramid-piling rings puzzle by removing a wrong ring and replacing it with the right one.

Dad is trying to build something
with blocks but I'm interested
in the neighbor’s dog ;)

What really happened is that her learning process motivated me to explore something “new”. A discipline that is called to be (if it is not really is) the new toolset for every single software developer. It turns out that the new hobby came with a practitioner approach but it required to be found first.

What happened next?

At this point, I started to watch the machine learning course videos of Andrew Ng using alternatives sources. Cubans (that live in Cuba) are not able to access the certification program at Coursera. In some way, the USA embargo to Cuba – (specifically the USA exportation laws) – also affects the deep learning global democratization process.

However, it doesn't worry me too much, actually never does. I can't get the certification but I can get the knowledge. Andrew’s course is actually motivating and didactically insuperable. It was able to bring back to life some math and algebra that I thought was dead in my brain and made me felt very comfortable implementing a vectorized version of the Stochastic Gradient Descent algorithm with Octave.

"We apologize for the inconvenience" said a @coursera message because also #block us, https://t.co/vsmxsoaMop #Cuba @cubavsbloqueo pic.twitter.com/4JpDtas3PH
— Alexánder Fernández (@alexfdezsauco) August 9, 2017

After understand how this works (almost just like Andrew used to say), and what kind of problems were possible to solve, just I wanted to put this in practice at the production level. Some researchers (friends of mine) sold me Tensorflow as the Holy Grail but I had some doubts about Python performance (still have).

The new hobby comes with the
practitioner approach

After some research, I found exactly what I was looking for. A deep learning library for JVM. Deeplearnig4j (DL4j) is an excellent library and comes with this excellent book Deep Learning: A Practitioner's Approach from Adam Gibson and Josh Patterson. I just needed to read the preface, to identify me as a deep learning practitioner. It wasn't to hard notice that was the right book for me. I'm pretty sure that is also the right book for you.

DL4j also comes with a lot of helpful features and tools to assist the training process including Training UI, datavec, early stopping, even GPU support, and more, but we can talk about this in forthcoming posts.

DL4J Performance - examples/sec on Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz (No GPU required in this case)

Nice normal distribution shape for weights in the Layer Parameters Histogram (So, no regularization issues)

Some network details

Recently, I also started to work in build a proof of concept of an anomaly detection system built on top of DL4j, specifically by using Autoencoder networks with promising results but I can't give you anything in advance yet (just wait for it).

Autoencoders are supported by DL4j

Conclusions

I have a lot to learn about deep learning, but the journey has already begun and I have the intention to share it with you.

Btw, if you are not motivated to go “deep” with machine learning yet, be a father (or mother) first, and then let me know. I'm sure you will find the same "biological" inspiration when you witness how to "train a neural network" looks like ;)

Alexa talking with a large predator cat of stone

Thursday, January 28, 2016

Introducing SharePoint Package Manager

Introduction

Half a year ago, I announced – in this post – the forthcoming release of a new secret weapon. I apologize for the delay, but I entertained myself doing other things and I couldn't find a way to spend a couple of days on formulating it.

I could even tell you: ‘I had no time to do it’, but I learned the lesson – maybe I'm still learning – from my father. He always insisted me: ‘Alex, the time is the same’. So, what actually happened was a matter of time management and prioritization.

The true is, I also expected that something like this already exists at this time, but it doesn’t. So, I recalled this, wrote the minimal working code, push the sources and here it is.

What is SharePoint Package Manager?

SharePoint Package Manager

The SharePoint Package Manager is a NuGet-based distribution and deployment system for SharePoint's solutions.

Basically, it's an extension of the SharePoint Central Administration site that automates the process for install or update solution packages from package sources (regular NuGet repositories).

Why NuGet?

Why not? NuGet is the widely accepted and popular package manager among .net developers.

There are also a lot of custom solutions and initiatives that use NuGet as the backend, for instance, ReSharper extension manager, OctopusDeploy, Chocolatey, Squirrel, you know, even Catel extending the modularity options of Prism.

There are only a few rules in order to use this package manager, starting with this 'new' concept: solution package.

What is a solution package?

A solution package is a regular NuGet package with its name ending in '.wsp' and with following structure:

A sample of solution package for SignalR.

Notice how the folder 'content' contains a wsp file with the same name of the solution package and that’s it.

A solution package can also have declared dependencies, but only between solution packages.

Maybe the naming convention looks weak, but there is only one package in the gallery with name Digst.OioIdws.Wsp, so, for me, is enough. Eventually, the package manager could also track non-solution packages through an ignore list.

Solution packages from NuGet Gallery. The Digst.OioIdws.Wsp isn't a solution package.

Managing package sources

The SharePoint Package Manager includes a page to manage the package source. You are able to add, remove, edit, enable or disable a package source.

Creating a new package source

The default package source is the NuGet Gallery.

Installing or updating solution packages

The SharePoint Package Manager also includes a page to manage the farm packages. Looking for the installed solutions in the SharePoint solution storage and making a join with available solution packages from the package sources, the page shows the available solution packages to install or update.

Managing the farm packages

Sorry about the name of the solution package in Spanish on the image above but was client choice and the picture was taken from real life scenario.

The SharePoint Package Manager provides an option to install or update solution packages. After clicking the button – on the right of the solution package info – the system will schedule a job to install or update the last available version of the selected solution package. If the selected package has dependencies, the package manager will also install or update all dependencies in the right order.

Tracking the install or update process.

What's next?

This is almost a draft or a proof of concept, remember just the minimal working code. So, you can try. I also published the SignalR.SharePoint.WSP solution package at NuGet gallery, so, you should see it as an available solution package.

Enjoy it, and let me know what you think. Even better, we can do this together, just fork the source on GitHub. There are a lot of things to do, including better user interface, REST services, a better approach for settings storage, ignore list, performance issues, msbuild extension, Visual Studio extension, and so on. You tell me.

In the meantime, I’m already working to turn more Catel’s rumors in true.

Talking about rumors, if everything goes right, I might update my resume in about seven months. Yes, your assumption is correct, I'm a father candidate ;).

Tuesday, January 12, 2016

'The Force Awakens' or How awake the communication channels on a development team?

Introduction

I’m actually not a follower of the Star Wars saga even when I saw all the seven movies – the last one as a cinema pirate copy – plus some episodes of animated series and also including some TV movies with the Ewoks when I was younger.

An Ewok

Who didn’t heard the story about Anakin Skywalker? The boy that was trained as Jedi by Obi-Wan Kenobi who eventually was turned – by his trend to the dark side of the force and also influenced by a Sith – into Darth Vader, who also was the father of Luke – the new hope to get the balance of the force – and the princess Leia and so on.

Probably I missed something, some of these movies are too long and actually, some characters make me sick. But the true is that these movies were a turning point in the world of the visual effects.

Star Wars - The Force Awakens poster

I won’t talk about the movies but the title of the seven episode “The Force Awakens” inspired me to write a new post.

The concept of the force, such energy field with source even genetic - cell with something call “midi-chlorian” - that empower the jedies knights to move objects with his mind, get great reflex and anticipation, and also great skills as pilots, is actually unreal but in some scenarios could reveal its presence.

Let me show you the way you can found an expression of the force as a result of an approach that allow you awake the team communication channel from a hypothetical scenario.

Common communication channels

On software development teams the communication is a key factor. Not a few agile practice talk about it including informative workspace, release the plan, sit together, pair programming, ubiquitous language or even poker planning. I have no doubt that every single agile practice is about to improve the communication in the team.

From a real life scenario communication channels could look like this way:

Common communication channels

The client tells to the product owner what he wants. The product owner can create a vision of the final product – optionally with a consultant - and share the expected result to the project lead in terms of requirements and also prioritize such requirements. Then project lead and the team should be able to create a roadmap in term of features and small increments to cover all client’s expectation.

The project lead must share systematically all his knowledge or the vision with the team looking for an effective and integrated solution. Notice that the project lead should be the only communication channel between each developer and the people that really knows what the client wants.

But the world isn’t perfect and the project lead also should be ready to manage the changes on requirements including say NO or defers some requirements. There is a single rule, every single decision must be taken in collective with the team.

Actually, as a developer, I always expect an effective communication system. It could be done through public the iteration plan, plus daily meetings, retrospective, and demos, everything that allow the whole team get an integrated view of the thing that they are building. An informative workspace where they can share the issues or impediments and, of course, enjoy the progress.

Such approach should work and actually does.

But sometimes something happens that drive the team to a complete failure of the communication system.

Breaking the communication channels

Allow me to twist a bit the previous scenario.

Assume that suddenly the product owner and the external consultant start to manage the team in a very odd way. They do direct tasks and components assignments in private instead thought the project lead. From another side, but not less important the project lead always saying YES to each the product owner request without team consulting. At some point, the product lead also stops to communicate with the team at all, probably just a few emails or comments about some new client expectation, but not in real team dynamics.

The issue here is about product owner and consultant vision. They don't want to know about small increments. They want the final solution and production ready now. They don't know about the team dynamics or how they manage the source or about components integration. They also expect for some magical entity do the coordination work. But such thing doesn’t exist in software development.

The result: a completely broken communication system.

Broken communication channels

Awaking the communication channels

Could be spent some time before you really notice the disaster. The bug corrections lead to other incongruousness or to regressions. Even worse, people sitting together can't found a real way to coordinate their work. But something remains immutable, no one talk about the system beyond a few individuals' virtual chatting sessions.

At this point, some practices could save them all. So, also assume that the team have a full dependency in a build automated system (just a build server), and have a well configured a deployment pipeline that run unit and integration tests and also run the deployment routines into quality assurance servers (including product owner's server).

You as team member could ask yourself: Why don't use this to trigger the communication back?

So, you as a team member with integration needs can start to write a lot of integration test. For instance, for each disagreement, for each incongruousness, every assumption, every of non-final decisions, everything that anyone told aloud, you can translate it into an executable code that immediately will run on the build server, and eventually this approach will lead to stopping the deployment pipeline due by the tons of errors.

This is a point when a sort of energy field (a.k.a 'the force') should start to play its role. The product owner needs upgrades. But the upgrades are locked because the deployment pipeline is locked. So, eventually, someone would need to start talking again.

Conclusion

There aren’t nothing more effective to improve – not only to awake – the communication in your team than the integration tests. That is because coding is actually the communication channel that never dies. Even better if you really use practice like continuous integration you are already empowered – without “midi-chlorian” – to awake you communication channels at any time.

By the way, while I´m writing this lines, I noticed that Multivision channel - on Cuba - started to broadcast the saga again. So, if you never saw it, it's a good timing ;)

Star Wars Episode I: The Phantom Menace at Multivision

Remember: Your focus determines your reality.
May the force be with you.

Thursday, August 13, 2015

Simplest way to implement a state machine approach for SharePoint list items

Introduction

As you can read in Wikipedia “A finite-state machine (FSM) or finite-state automaton (plural: automata), or simply a state machine, is a mathematical model of computation used to design both computer programs and sequential logic circuits. It is conceived as an abstract machine that can be in one of a finite number of states. The machine is in only one state at a time; the state it is in at any given time is called the current state. It can change from one state to another when initiated by a triggering event or condition; this is called a transition. A particular FSM is defined by a list of its states, and the triggering condition for each transition.”

SharePoint developers frequently face to state machine problems and the typical solution includes a workflow implementation. Actually, SharePoint includes default workflows for common scenarios, for instance Approval (route a document or item for approval or rejection), Collect Feedback (route a document or item for feedback), Collect Signatures (route a document, workbook, or form for digital signatures), Three-State (track an issue, project, or task through three states or phases) and Publishing Approval (automate content routing for review and approval) workflows.

But sometimes such workflows doesn’t fit exactly to your scenario or worse, the workflow usage is just an overkill.

Next, I’ll introduce you a clean state machine (also personal) approach to handling state transitions of SharePoint list items without workflows.

Event receivers as workflows alternative

As you should suppose the only way implement this is by using event receivers (a.k.a. event handlers) instead workflows. But how make a clean solution event handler based to implement a state machine.

Allow me show you how the final code looks like for a customized publication process based on this approach:

[StateMachine("State", typeof(PublicationRequestStateMachineValidator))]
public sealed class PublicationRequestStateMachineItemEventReceiver : StateMachineItemEventReceiverBase
{
 [State("Approved")]
 private void OnApproved(SPItemEventProperties properties)
 {
  /*...*/
 }

 [State("Rejected")]
 private void OnRejected(SPItemEventProperties properties)
 {
  /*...*/ 
 }

 [State("ReadyToBePublished")]
 private void OnReadyToBePublished(SPItemEventProperties properties)
 {
  /*...*/
 }

 [State("ReadyToBeUnpublished")]
 private void OnReadyToBeUnpublished(SPItemEventProperties properties)
 {
  /*...*/ 
 }
}

Notice the introduction of a few new classes:

StateMachineAttribute: To indicate the column to be monitored and its validator class. The example above indicates that the column name is "State" and the validator class is typeof(PublicationRequestStateMachineValidator).

StateAttribute: To indicate the event method that will be called when the state change to the specified value. For instance, the usage of [State("Approved")] means that the method OnApproved will be called when the item change to the "Approved" state.

StateMachineItemEventReceiverBase: Implements the base behavior of the state machine (invokes the validation to avoid not allowed transitions also call the event methods).

To make this work properly you must register you state machine item event receiver on the list that you want to monitor for state change. This must be done via RegisterEventReceiverIfRequired extension method for SPList. This method not only registers the event receiver the list, it also adds a custom column to the list to implement the change detection approach whether the event receiver inherits from StateMachineItemEventReceiverBase.

This could be done with the follow code in a feature activation for instance.

publicationRequestList.RegisterEventReceiverIfRequired(SPEventReceiverType.ItemUpdating, typeof(PublicationRequestStateMachineItemEventReceiver).Assembly.FullName, typeof(PublicationRequestStateMachineItemEventReceiver).FullName);
publicationRequestList.RegisterEventReceiverIfRequired(SPEventReceiverType.ItemUpdated, typeof(PublicationRequestStateMachineItemEventReceiver).Assembly.FullName, typeof(PublicationRequestStateMachineItemEventReceiver).FullName);

How validate state transitions?

Support transition validation is also a cool feature of this library. In order to validate the transition, you can inherit from StateMachineValidator class and type your own. Our custom publication request example the state machine validator looks like this:

public sealed class PublicationRequestStateMachineValidator : StateMachineValidator<string>
{
        public PublicationRequestStateMachineValidator()
        {
            this.AddAllowedTransition("WaitingForApproval", "Approved");
            this.AddAllowedTransition("WaitingForApproval", "Rejected");
            this.AddAllowedTransition("Approved", "ReadyToBePublished");
            this.AddAllowedTransition("ReadyToBePublished", "Published");
            this.AddAllowedTransition("Published", "ReadyToBeUnpublished");
            this.AddAllowedTransition("ReadyToBeUnpublished", "Unpublished");
        }
}

Now if someone tries to change the state from WaitingForApproval to Published - for instance - the change will be reverted automatically.

This is also useful to enable or disable some actions. Here are couple of pictures that depicts the enable state of the ribbon button depends on an allowed transition validation implemented in javascript in combination with a REST service.

Example A: Enabled because transitions are allowed

Example B: Disabled because transitions are not allowed

Conclusions

This post is an introduction to the new StateMachine.SharePoint library. This library allows you simplify the implementation of a state machine based approach to monitor and validate the states of SharePoint list items.

For now, you can build this library from its sources and deploy directly into your SharePoint farm or wait for the "top secret" weapon and forthcoming project PackageManager.SharePoint ;)