Recommender systems have quickly become prominent everywhere in our daily lives. We have grown accustomed to receiving recommendations for everything, be it for the next song we listen to, movie we watch, or the next product we buy. We are used to having so many options to choose from that it is sometimes hard to make simple decisions, such as which movie to watch tonight. We investigated with the help of Jens several out-of-the-box recommendation services offered by AWS, as well as building our own recommendation solution using the AWS machine learning capabilities.

This article provides a basic understanding of recommender systems and why it is important to deal with them. Furthermore, it will give an overview and comparison – according to several criteria – between the several AWS recommender systems and our own recommenders. This will prove helpful for anyone who wants to build and use recommendation systems, whether for customer projects or out of personal interest.

The motivation for our research is that recommender systems are present in our daily lives and are used by countless companies to increase their revenue. More and more companies are starting to use their own recommender systems. Therefore, it is essential to build up knowledge and dive deep into the underlying details to be able to offer good consulting to our customers. We investigated several systems in order to find the best individual solution for upcoming projects at Opitz Consulting.

The following figure illustrates the main milestones of our project.

Timeline of project activities

Collaborative Filtering vs. Content-based Filtering

Recommendation systems are based on two main principles, namely collaborative filtering and Content-based filtering. These two methodologies are best explained using an example in which a movie recommendation system aims to generate good movie recommendations for users based on their tastes.

In this example the movies are the items which are supposed to be predicted.

Content-based filtering:

Content-based filtering generates predictions based on the properties of the items it predicts. In our example the movie recommendation examines the properties of the movies to generate predictions. For instance, should a user watch a lot of western movies, the system will recommend movies belonging to the „Western“ genre. [1]

Collaborative Filtering:

Recommender systems using collaborative filtering examine the similarity between users and/or items. Thus, the system recommends items to the user which are liked by similar users. [1]

These approaches are often insufficient on their own, which is why they are combined with other methods. Besides, the field of Deep Learning is growing so fast that Deep Learning models are developed for many different applications. Consequently, such models also exist for the purpose of generating user recommendations. [1]

Having explained our motivation, we will subsequently present the services and methods we used, namely Amazon Personalize, Amazon AutoML and our own auto-encoder as well as our own content-based model. In conclusion we will additionally provide a comparison between all of them in order to share our experiences, allowing the reader to get a hint at which system is best to use for his/her use case.

Machine Learning models are built and trained on data, the goal being to generalize well for unseen new data.  Consequently, it is important to find good data sets before training the model, which are big enough to represent the domain well in which the system will be used later on. For all models, we used the same data set in order to be able to evaluate and compare them accurately. In general, we used the ratings table from the MovieLens dataset containing 27 million ratings by 280 000 users on  58 000 movies [3]. Due to the fact that we didn’t want to restrict ourselves to only using collaborative filtering, we also used a Kaggle data set providing additional information about every movie such as genre, actors and title [2]. Our goal was to take this additional information into account when generating user predictions.

Using a Service: Amazon Personalize

Amazon Web Services (AWS) provides a wide variety of different services. Especially the machine learning services are growing fast. Amazon Personalize is the service directly corresponding to a recommendation system. You simply provide the data in the required format and amazon Personalize takes care of training models and creating an endpoint, which will generate the top predictions for a certain user after feeding in the corresponding user id.

We tried the following three different models: HRNN, HRNN metadata and Popularity Count.


HRNN stands for hierarchical recurrent neural network and solely uses a data set consisting of user ratings about a specific movie (USER_ID, ITEM_ID, RATING, TIMESTAMP). It offers the possibility to weigh recently rated movies more than user movie interactions which took place further in the past. This allows adjustments in case someone’s taste regarding movies changes over time. It might seem strange in that context, but it makes sense in the context of shopping, for example where customers who are single will buy different things than perhaps some years later when they have a family.


Here, the underlying model consists also of a hierarchical recurrent neural network. In addition to the ratings, it uses additional information about the user, as well as additional information about the items to generate recommendations. For this model we used the second data set to provide additional information about every movie. It takes longer to train since the size of the training data is larger. According to Amazon it achieves better results in the event that it receives high quality metadata about user and item.

Popularity Count

While the previous two models were deep learning models and had to be trained, the popularity count is the simplest one and does not need to be trained. So, it can be generated very quickly. In contrast to the name „Personalize“ the popularity count does not provide a personalized solution. It simply recommends the items which were rated most often. For an item to be recommended it is sufficient to be rated frequently and not necessarily positive. This model is obviously limited but might still provide some good recommendations based on item popularity.

All in all, Amazon Personalize is easy to use. The only thing provided by the user are the different data sets adhering to the requirements of Amazon Personalize and a schema explaining each data set.

It is time consuming to pre-process the data to the desired format, but afterwards Personalize is easy and fast to use. The time for the pre-processing depends on the given data set. Usually, customer data sets require heavy data preparation, due to the strict guidelines of amazon Personalize. As a side note, amazon Personalize does only provide some items as recommendations for a given user, but it does not provide a predicted rating.

Using AutoML: Amazon SageMaker Autopilot

The other AWS service we used is Amazon SageMaker Autopilot, which belongs to the AWS SageMaker service. It does not directly correspond to recommender systems but can be used for any kind of machine learning problems and is the AutoML module of AWS.

Here Amazon decides fully autonomously if the problem is a classification or a regression problem and then creates and trains several pipelines with different models. In the end the best pipeline can be easily deployed, while the predictions have to be generated by sending requests to the mounted endpoint. This is different to Amazon Personalize, where the recommendations can be obtained directly from the GUI, but Autopilot creates more pipelines.

The best model according to Autopilot was an xgboost model [8], but we also evaluated a linear learner model to see if even a linear model is able to fit the data equally well. Besides we were also interested in how much the pre-processing of the data affects the generated movie predictions for the users. This is why we first fed all the available data (see Figure 1) to Autopilot and then only some pre-selected columns (see Figure 2). In the latter case we discarded all the columns which were either not set for most of the rows or which did not seem to contain additional useful information to us (see Figure 2). After evaluating the linear model and the xgboost model with both configurations we realized that for our problem neither the model nor the pre-processing significantly affects the output.

All things considered, it is more costly to use Autopilot than to use Amazon Personalize, since the inference part needs to be done in the code. But in general, it is still easy to use and the user does not need to decide on any model not even if it is a regression or a classification problem.

Fig. 1: Provided movie metadata of imdb data set
Fig. 2: Movie information which is taken into account for Amazon SageMaker Autopilot

Self-coded: Auto-Encoder

After having tried and experimented with the previous presented services, we aimed for creating our own recommendation system aiming to beat the recommendations of AWS. The first approach we tried was to use an Auto-Encoder, encoding the user-items ratings to a subspace and then reconstructing the lower-dimensional data back into the original space. In order to achieve that an Auto-Encoder is a deep learning model that consists of an encoder part with stepwise decreasing number of neurons per hidden layer and an encoder part with stepwise increasing number of neurons per hidden layer. The number of neurons of the input and output are equal. The idea behind that is to extract the relation of a user and its ratings by forcing the model to learn lower-dimensional features and not using all the movie ratings.

In general, an auto-encoder is used to learn a lower-dimensional representation of an input. Therefore, it transforms a given input into a lower-dimensional subspace from which it reconstructs the compressed data back into the original space.  By that the model learns to represent a user not by every movie rating anymore, it must learn features how movies are connected among each other. The parameters of the model are learnt by minimizing the loss between the original input and the reconstructed output of the Auto-Encoder, where the input consists of the rows of the user-item matrix shown in Figure 1.

Figure 3 [4] is showing an exemplary user-movie matrix showing the ratings from every user to all movies. This matrix is used to train the model. Single rows correspond to different users. After learning the parameters of the model, it is possible to do inference. So, we fed single row vectors of users with all the movies into the autoencoder and the autoencoder generated ratings for every movie. Figure 4 [5] illustrates an exemplary architecture of an auto-encoder where it can be easily seen how the input data is first transformed into a lower-dimensional subspace before transformed back into the original space. [6]

Fig. 3: User-movie matrix containing user ratings for watched movies otherwise empty spaces for movies which haven“™t been watched yet [5]
Fig. 4: Exemplary structure of an Auto-Encoder encoding the input data to a lower dimensional subspace(red) and reconstructing this representation to the original space [5]

Summarizing, our auto-encoder tends to recommend movies which are rated a lot. It would take more effort and time to apply some finetuning to the model so that it is less affected by the most rated movies.

Self-coded: NLP content-based recommender

The autoencoder is based on the idea of collaborative filtering. Therefore, our goal was also to build a content-based recommender system ourselves. Since we all like natural language processing (NLP) we aimed to investigate the contents of the movies and tried to recommend movies which are similar to popular movies.

We selected certain columns which describe the movies the best in our opinion, like plot description, title, genre, director and actors.

Fig. 5: Movie information which is taken into account for NLP content-based recommender

From these columns we created a concatenated string for every movie and used a vectorizer in order to obtain a numerical representation of every movie. This is necessary to apply the cosine similarity between every movie vector and every other one. The cosine similarity measures the angle between two given vectors and therefore measures the similarity between two vectors. We obtained a matrix (shown in Figure 6 below [6]) containing the similarity between every movie-movie pair. The idea of our NLP based movie recommendation system is that movies having some actors or directors in common or dealing with a similar topic having a similar vector representation and therefore a high cosine similarity.

Fig. 6: Movie movie similarity matrix, showing how similar two movies are. The higher the value the more similar the movies seem to be [6]

This cosine similarity is used to generate movie predictions by taking the best twenty movie recommendations from our data and recommending always the most similar movie to each one. Another interesting and promising approach would be to take just the first ten most popular movies and take the two most similar movies to each one as a personal recommendation.

After observing our recommendations, we realized that it recommends movies from the same director.

The main effort when implementing this was to decide which movie data to consider and then to bring it into a string format the vectorizer can process. There is still some potential to improve. You could think about using a different vectorizer for representing the movies or take different data into account. [7]


On our way we came across some technical and methodical challenges. We needed to join both data sets to take into account as much information as possible so that our models perform well. The joined data set contains around 8Gb of data, which is too large for being processed by AWS SageMaker. Therefore, we had to decrease the data set in a reasonable way.

Additionally, it is important to pay attention on how to process data. In any case it is essential to avoid any iterative approaches if possible, because it takes too much time and, in the end, also money since it has to run on a large machine in SageMaker. As a consequence of that, we learnt to use pandas in an efficient way for data processing. With more time/ budget it would be possible to develop better self-coded models, because the current models are very basic. There is still huge potential to improve these models.


The following figure shows our subjective evaluation of the different methods. We use a grading system where 5 corresponds to the worst grade and 1 to the best. Amazon Personalize can be used to build recommendation systems in a fast and uncomplicated way, since the user only has to pre-process the data to satisfy the requirements of Amazon Personalize. More complex problems may require more individual solutions. Furthermore, better results might be obtained by using self-coded approaches, but this also requires more time in research as well as in model development.

Fig. 7: Grading system
Fig. 8: Evaluation of the presented methods using a grading system from 1 (good) to 5 (bad)


[1] Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of Massive Datasets (2nd. Ed.). Cambridge University Press, USA



[4] Di Noia, Tommaso & Ostuni, Vito. (2015). Recommender Systems and Linked Open Data. 10.1007/978-3-319-21768-0_4.





Alle Beiträge von Maximilian Palmer

Schreibe einen Kommentar