TDWI 2017 – Ein Erfahrungsbericht der OC Young Guns

Vom 26. bis 28.06.2017 war OPITZ CONSULTING mit einem starken Team auf der TDWI Konferenz in München vertreten. Die TDWI Konferenz ist ein Branchentreffpunkt der BI Community mit über 1200 BI-Experten, bei dem die meisten namenhaften Unternehmen der BI-Branche zusammenkommen, um Expertenwissen auszutauschen. Mit OC Beteiligung gab es drei spannende Vorträge zu folgenden Themen:

Besonders hervorzuheben ist hier der letzte Vortrag der Liste, welcher in Kooperation mit dem  Kunden Miller Brands Germany GmbH (MBG) gehalten wurde.

Bereits am ersten Tag konnten wir einige interessante Vorträge, insbesondere über Trendthemen wie Big Data und Analytics, verfolgen.

Abgeschlossen haben wir den ersten Konferenztag mit dem Besuch der Keynote von Robert Schröder, der in spannender Art und Weise zum Thema „Der Unterschied zwischen Fehlern und Versagen – die Entwicklung der Sicherheitskultur in der Luftfahrt“ referierte.

 

tdwi_schroeder-e1500989051763.jpgVortrag Robert Schröder „Der Unterschied zwischen Fehlern und Versagen“

Dabei stellte Herr Schröder sehr gekonnt die Vergleiche der Sicherheitskultur in der Luftfahrt zu Bi Projekten und dem täglichen Leben dar. Nach dieser aufschlussreichen und spannenden Special Keynote wurde es aber dann auch Zeit für die TDWI Welcome Reception. Innerhalb dieser lockeren Abendveranstaltung sorgte OC gemeinsam mit seinem Kunden MBG für ausreichend Getränke und Fingerfood.

Am zweiten Konferenztag haben wir uns dazu entschlossen, unterschiedliche Slots zu besuchen, um möglichst viel Wissen von den Experten abgreifen zu können. Im Nachhinein haben wir uns dann gegenseitig über die spannenden Vorträge inklusive einer inhaltlichen Bewertung informiert.

 

tdwi_ocTom Gansor von Opitz Consulting vor dem Stand von MBG

Außerdem fand an diesem Tag unser Young Guns Jahrestreffen statt, welches sich nun ein Jahr nach Gründung wiederholte, doch diesmal mit uns. 🙂 Dort wurde rückblickend über das bisher Erreichte gesprochen, sowie weitere Ziele für die Zukunft gesteckt. Seid gespannt auf die noch in diesem Jahr stattfindenden Barcamps in Hamburg und Stuttgart.

Am letzten und dritten Konferenztag hatten wir dann die Chance, am Hackathon teilzunehmen. Dort ging es diesmal darum, ob es möglich ist, neue und vor allem verwertbare Erkenntnisse aus unstrukturierten Daten mithilfe von „Cognitive Computing“ automatisiert in wenigen Stunden zu generieren. Fazit:  Ja, das ist durchaus mit den zur Verfügung stehenden Services von IBM möglich. Ein ausführlicher Bericht kann gerne hier nachgelesen werden.

Alles in allem blicken wir auf eine sehr gelungene und lehreiche Konferenz mit vielen spannenden Einzelgesprächen zurück.

Viele Grüße

Fabian und Christian

 

Bildquellen: https://www.sigs-datacom.de

Veröffentlicht unter Analytics & BigData, News, OC Inside, Uncategorized | Verschlagwortet mit , , , , | Kommentar hinterlassen

AWS News KW 29

Neue GPU-Powered EC2 Instanzen (G3)

Ab sofort ist der neue G3 Instance Typ für die Regionen US East (Ohio)US East (Northern Virginia)US West (Oregon)US West (Northern California)AWS GovCloud (US), and EU (Ireland) verfügbar.

Die Leistungsdaten der drei neuen Instanztypen sehen wie folgt aus.

Model GPUs GPU Memory vCPUs Main Memory EBS Bandwidth
g3.4xlarge 1 8 GiB 16 122 GiB 3.5 Gbps
g3.8xlarge 2 16 GiB 32 244 GiB 7 Gbps
g3.16xlarge 4 32 GiB 64 488 GiB 14 Gbps

GPU-Powered EC2 Instanzen sind eine gute Wahl für 3D Rendering und Visualisierung, virtual Reality, video encoding, machine learning and many more.

Nähere Infos zu den neuen Instance Types hier.

Lambda@Edge verfügbar

Mit Lambda@Edge kann eine Lambda Funktion nahe dem Endbenutzer auf CloudFront Ereignisse reagieren. mögliche Anwendungsfälle sind unter anderem:

  • Auswertung von Cookies und URL-Rewriting für A/B Testing
  • Senden spezifischer Objekte basierend auf User-Agent Header
  • Implementierung von Access Control durch die Auswertung spezifischer Header vor dem Aufruf des eigentlichen Ziels.

Nähere Infos zu Lambda@Edge hier.

Server-Side Encryption for Amazon Kinesis Streams

Ab sofort können Daten eines Kinesis Stream serverseitig verschlüsselt werden. Die Verschlüsselung der Daten ist in vielen Usecases wichtig um regulatorischen Vorgaben erfüllen zu können.

Kinesis Streams werden mit dem AES-256 GCM Algorithmus verschlüsselt. Die Verschlüsselung kann für jeden vorhandenen und neuen Stream per AWS Konsole und AWS SDK gestartet, gestoppt und verändert werden.

Nähere Infos zu SSE für Kinesis Stream hier.

Resource Data Sync (S3 Sync) für den EC2 Systems Manager

Amazons EC2 Systems Manager ist ein Management Service für die Erstellung von System Images, die Sammlung von Inventories, die Verwaltung von Windows und Linux Konfiguration sowie das Einspielen von System Patches.

Mit Resource Data Sync (S3 Sync) ist es nun möglich gesammelte Inventory Daten automatisch in S3 zu aggregieren. Mit Amazon Athena können nun Suchanfragen auf den gesammelten Daten durchgeführt werden. Mit Amazon QuickSight können die Daten bei Bedarf auch visualisiert werden.

Nähere Infos zu Resource Data Sync hier.

Veröffentlicht unter Uncategorized | Kommentar hinterlassen

Factsheet OC|LeanEAM

Ein Angebot aus dem Leistungsportfolio unseres CC Strategy ist OC|LeanEAM. Unser gerade veröffentlichtes Factsheet

http://www.opitz-consulting.com/fileadmin/user_upload/Collaterals/Fact_Sheet/83-factsheet-lean-eam.pdf

bietet einen guten Überblick über unser Angebot im Bereich Enterprise Architecture Management (EAM).

Das Thema EAM ist bei vielen Kunden leider negativ belastet, da es oft als zu komplex und schwerfällig empfunden wird. Mit dem Angebot OC|LeanEAM möchte Opitz Consulting (OC) hier eine leichtgewichtige und pragmatische Methode vorstellen, die Unternehmen bei der Umsetzung ihrer Strategie unterstützt. OC|LeanEAM kann bei Bedarf durch andere Angebote aus dem CC Strategy wie z.B. dem Digitial Awareness Workshop und Change Facilitation ergänzt werden.

Mit EAM können ganzheitliche Betrachtungen des Unternehmens entwickelt werden, die fundierte und schnellere Entscheidungen ermöglichen und dabei strategische, businessrelevante als auch technologische Gesichtspunkte berücksichtigen.

Die folgende Analogie zur Städteplanung hilft das Thema Enterprise Architecture besser einzuordnen

EAM_Analogiie

Das folgende Whitepaper bietet weitere Informationen

http://www.opitz-consulting.com/fileadmin/user_upload/Collaterals/Artikel/whitepaper-it-transformation-with-enterprise-architecture_sicher.pdf

Veröffentlicht unter Strategy & Methods | Verschlagwortet mit , , , | Kommentar hinterlassen

OPITZ CONSULTING beim B2RUN

Wie jedes Jahr war auch 2017 ein Team von OPITZ CONSULTING beim B2RUN in München am Start. 16 laufbegeisterte Kolleginnen und Kollegen aus München und Nürnberg schalteten ausnahmsweise einmal den Rechner etwas früher aus, um beim größten deutschen Firmenlauf mit rund 30.000 Teilnehmern dabei zu sein.

Ob aus der Niederlassung, aus dem Homeoffice oder direkt vom Kunden: Am 13. Juli trafen sich alle im Olympiastadion, um sich gemeinsam auf den Lauf einzustimmen. Auch einige Angehörige waren gekommen, um das Team moralisch zu unterstützen. Gegen 19 Uhr fiel schließlich der Startschuss. Nun würde sich zeigen, ob sich das Training der letzten Wochen und Monate ausgezahlt hatte.

OC B2RUN 2017

Eine 6 Kilometer lange Strecke, die sich durch den Münchner Olympiapark bis zum Ziel im Olympiastadion schlängelte, war zu überwinden. Alle Mitglieder des OPITZ CONSULTING Teams kamen erfolgreich ins Ziel. Einige konnten sich über eine neue persönliche Bestzeit freuen und fünf Teammitglieder knackten sogar die 30-Minuten-Marke.

Bei Bananen, Butterbrezen und dem ein oder anderen Bier klang der Abend aus. Nächstes Jahr sind wir natürlich wieder dabei beim B2RUN München!

Veröffentlicht unter OC Inside | Kommentar hinterlassen

Thoughts about using API in IoT Scenarios

The number of IoT (Internet of Things) devices increases rapidly day by
day. According to Gartner Institute, 8.4 billion connected things will be
in use worldwide already this year. It means an increase of more than 30
percent comparing to the previous year. The IT research firms expect that
this number will be much bigger and can result from 21 billion (Gartner) to
37 billion (Cisco) by 2020! To imagine how our world is going to change, it
is worth to mention that all those forecasts do not even include PCs,
tablets and smartphones.

IoT projects contain not only large number of devices but also various
application programming interfaces (APIs) and huge amounts of data. The
easiest and the most efficient way to interact with these are APIs.

IoT devices are connected to data located in cloud-based services. They are
like a sky bridge there — IoT solutions on one side, data and capabilities
on the other. APIs make IoT useful, turns limited little things into
powerful bridges of possibilities. You can imagine APIs like something as
an inter-connector which provide the interface between the global network
and the Things. APIs expose the data that enables multiple devices to be
combined and connected to solve new and interesting workflows. Because of
that it should be ensured an easy way for communication which will be
managed in an efficient and secure system. It cannot be forgotten that with
new IoT devices it comes big risk which gives hackers and cyber criminals
more possibilities to attack. The nature of electronic devices operated by
the human beings plus the importance of some of the things that connect to
each other (satellites, traffic lights, vehicles) raise critical issues as
well. That is the main reason that APIs need to be more protected.

APIs are a fundamental enabler of the Internet of Things but without
management system, IoT devices can easily lead to catastrophe, especially
when it comes to:

  • versioning and supporting for using devices
  • management developers and device registrations
  • devices visibility and analytics
  • performance and scalability
  • full control of permissions

What is API Management in the IoT area?

It is a set of technologies and processes for creating, managing, securing,
analyzing and scaling the APIs from IoT connected devices. A developer
portal enables companies to provide everything that internal, partner and
third party developers need to be effective and productive building the
APIs.

The APIs allow devices to talk to each other in a consistent and structured
way that makes it really easy to get them to communicate. Writing about IoT
we need to mention about various of protocols (CoAP, XMPP, MQTT, WAMP, OMG
DDS, Stomp) and frameworks (Web RTC, ASP.NET SignalR,
webSocket.org, Couchbase, Socket IO, Meteor) so it leads that IoT API
management system also integrates IoT devices environment. The next very
important issue is the security. API Management system creates secure
user-friendly identity. It ensures secure connections to devices across
mobile and the IoT environment. Besides using a platform it is much easier
to identify and neutralize SQL injection, DoS attacks and other online
threats

Future world example

To prove how important is using API and API Management, let us take a look
at an example. According to Intel predictions the coming flood of data in
autonomous vehicles will reach level of ca 4 000 GB per day. Let’s add that
“BI Intelligence” estimated that there will be 10 million self-driving cars
on the road by 2020[1]. Those
information show us how important it will be to ensure right and stable
communication in the IoT world.

Source: http://images.techhive.com/images/article/2016/12/autonomous-vehicle-data-intel-100697604-large.jpg

Imagine situation that a car needs to send data as soon as possible in case
of accident. The best way for it would be using API solution.

Example payload

Most web applications support RESTful APIs which rely on HTTP methods. The
open source Swagger framework helps to design and maintain APIs. The
framework provides the OpenAPI Specification for creating RESTful API
documentation formatted in JSON or YAML. In this article will be showed
YAML file because it is much easier to read.
swagger.YAML
Listening 1 An Example of Swagger API documentation.

Thanks to API definition a car can easily communicate with an emergency
center during the accident. It can use the following methods to send data
into datacenter. There are several HTTP methods used to achieve this goal.

POST
/accident

Add a new accident to emergency center

Although car processes a lot of data locally it should send only most
valuable of them. The payload shouldn’t be too big so that it will not slow
down communication and can be processed fast. In this very simple example
let’s imagine that car is sending the following information:

  • car model
  • car number plates
  • accident time
  • owner name
  • GPS Location
  • Average speed
  • Passenger numbers
  • Status of accident
  • Injury of passengers

Actually a vehicle could send much more data which will be crucial in an
emergency center where artificial intelligence can decide what kind of help
should be sent.

Example Value Model of POST method

{
  "CarModel": "Toyota Yaris II 2014",
  "carNumberPlates": "D65261",
  "accidentTime": "1985-04-12T23:20:50.52Z",
  "owner": "Wojtek Konowal",
  "location": "[47.4925, 19.0513]",
  "averageSpeed": 110,
  "passangers": 1,
  "status": "registred",
  "injury": "Serious"
}

After getting the first help, damaged vehicle updates easily and fast it’s
status, which can indicate when passenger’s health gets worse so that
medical center should decide for example to send a helicopter to get
injured passenger faster into hospital. This function is ensured by the PUT
method.

PUT
/accident

Update an existing accident

{
 "CarModel": "Toyota Yaris II 2014",
 "carNumberPlates": "D65261",
 "accidentTime": "1985-04-12T23:20:50.52Z",
 "owner": "Wojtek Konowal",
 "location": "[47.4925, 19.0513]",
 "averageSpeed": 110,
 "passangers": 1,
 "status": "Service is pending",
 "injury": "Critical"
}

API solutions can be used also for gathering data by big data center which
later can easily find patterns in tautened of accidents and use machine
learning to train artificial intelligence to send better help to the
accident location next time.

GET
/accidents/all

Returns all accidents by status

We need to remember that a right API design cannot be done without implementing the efficient API Managment solutions. The above described example is directly connected with rescuing people’s life area. Every car should use oAuth2 to authorize
device access. Managed API ensures also proper access control, request
routing, buffering, stats collection, monitoring, alerting and decision
making.

Summarize

It is obvious that APIs are becoming crucial in the new developing IoT
world but these APIs must be managed to achieve optimal results. The API
management helps company to monitor, analyze, report and use policy
management. Without an effective API management strategy, an organization
cannot take right care of the usage of its APIs. If company neglects this
thing it can be not only a threat to their business but also to their users
and customers.

Veröffentlicht unter IoT & Industry 4.0 | Verschlagwortet mit , | Kommentar hinterlassen

How to “Apache Solr”: an Introduction to Apache Solr

Foreword: The information society has brought people a lot of information; on one side we enjoy endless information, on the other side we also feel a bit lost: how can we quickly and accurately filter the useful information from all the information the internet provides? The IT elites need to solve this problem urgently.

The emergence of full-text searching technology provides powerful tools to solve the information retrieval problem: Apache Solr is one of those tools.

This article deals with many aspects of Apache Solr – feel free to take a quick break from reading this blog occasionally and start practicing on your own! We’re sure that this can help you in getting an even more comprehensive understanding of Apache Solr.

In this article, we will talk about how to install the latest version of Apache Solr and tell you how to configure it properly. In addition, we will also tell you how to index a sample data file using Solr. Apache Solr supports different formats, including various databases as well as PDF, XML or CSV files etc. In this post, we will look at how to index data from an XML file. At the end of the article we will show you how to present your searching results using a web browser.

We leverage a Linux operating system in this article to run the examples (using Solr in a Windows environment shouldn’t make a big difference whatsoever). One more thing: before you start the Solr installation, make sure that you have installed a JDK 1.6 or above and that the JAVA_HOME environment variable is set properly.

Why use Apache Solr?

Apache Solr is an open source enterprise search server, implemented in Java and thus easy-to-extend and modifiable. It uses the Lucene Java searching library at its core for full-text indexing and search. The server communication uses standard HTTP and XML but also supports REST APIs and JSON. Solr supports both schema and schemaless configuration without Java coding, and it has also a plugin architecture system to support more advanced customization. Apache Solr provides a wide variety of features; we just list some of the most important features here:

  • Full-text searching function
  • Support of XML, JSON and standard HTTP communication
  • Highly scalable and fault tolerant
  • Faceted searching and filtering
  • Support of many major languages like English, German, Chinese, Japanese, etc.
  • Rich document parsing

Solr VS Lucene

Solr and Lucene are not competing; on the contrary, Solr is dependent on Lucene: the core technology of Solr is implemented using Apache Lucene. It is important to note that Solr is not a simple wrapper for Lucene but that the functionality it provides goes far beyond the capabilities that Lucene provides. A simple way to describe the relationship between Solr and Lucene is like a car and the engine. So better we drive a car than just an engine, am I right?

For more details about Lucene please see this blog article from one of my colleagues: Apache Lucene Basics

The homepage for Apache Solr can be found here: http://lucene.apache.org/solr/

Installing Apache Solr

Requirement: Linux or Windows system with JDK 6 or above installed

wget http://mirror.softaculous.com/apache/lucene/solr/6.6.0/solr-6.6.0.tgz
tar -zxvf solr-6.6.0.tgz
cd solr-6.6.0

Architecture of Apache Solr

Once Solr’s Zip file is downloaded, unzip it into a folder (solr-6.6.0). The extracted folder will look as depicted in the following picture:

HowtoSolr1

The folder system of Solr is arranged as follows:

  • The bin folder contains scripts which are used to start and stop the server; we can also use the command “status” to check server status.
  • The example folder contains several sample files. Some tutorials leverage one of these to show how Solr indexes data. But as we want to create our examples step by step and use curl via HTTP REST service we don’t use this folder in this article.
  • The server folder is the most important folder. It contains the logs folder – all the Solr logs are written into this folder. This will surely help us in checking for any errors during the indexing process. The Solr folder under the server folder contains one sub-folder for each collection or core that is managed by the server. We will go into more detail on those two artefacts later on. The configuration of Core and Collection elements is contained in a file called “managed-schema”, there is one such file for each Core or Collection.

Let us now have a look at the big picture of Solr Architecture

HowtoSolr2

A customer can use REST clients, cURL, wget, Postman, as well as native clients (available for many programming languages) to communicate with Solr. Solr will send the corresponding command to Lucene Core for

  • Index
  • Update
  • Delete
  • Search

The managed schema file is used to configure a core or collection.

Let’s assume the following example: in case we want to analyze log data from an Apache HTTP Server in our Lucene core, first of all we should start the Solr server, create a core in Solr and then add some useful fields to the managed schema file of the core. Those fields could for example include

  • Timestamp (as unique key)
  • Error_Code
  • Error_Message etc.

The next step is to implement an application which sends the log data from the Apache Server to Solr and indexes the documents by using Lucene Core. Once the date has been indexed Solr can be queried (using its REST API for instance).

We will have a more detailed look on the necessary steps in the rest of the article.

Start and stop Solr

Apache Solr has a built-in Jetty Server, and we can use command-line scripts to start the Solr server. Just go to the bin directory under folder solr-6.6.0, and enter the following command into the command window.

solr start

This command will start the Solr server using default port 8983 in our local environment. Now we can open the following URL and validate in the browser that our Solr instance is running.

firefox http://localhost:8983/Solr/


HowtoSolr3

This opens the main page of the Solr admin server. For further information on the admin server please visit https://cwiki.apache.org/confluence/display/solr/Overview+of+the+Solr+Admin+UI. As can be seen in the red box on the left which says “No cores available” we haven’t created any cores in our Solr server yet. So our next task is to create a new core.

Checking the server status can be done using the command:

solr status

To stop the server, please use the following command.

solr stop

Create a core

Before we start to create a core, it is mandatory for you to understand the different artefacts (as, for example, “documents” as a basic unit of information) that Solr is managing.

Please use this link to first understand some basic definitions in Solr: https://cwiki.apache.org/confluence/display/solr/Overview+of+Documents%2C+Fields%2C+and+Schema+Design

Once you have a basic understanding of the Solr definitions we are ready to create our first Core now. Cores and Collections are similar to a table in a relational database – whether a Core or a Collection is created depends on whether Solr is running in standalone (Core) or SolrCloud mode (Collection). As we have started the Solr server in standalone mode a few steps before we will be just creating a Core later on. More details on the usage of the SolrCloud will be discussed in one of my next blog posts.

In this article we use XML files containing employee data as an input and want to analyze or query this data using Apache Solr. First of all we must create a core “employeedata”.

Let’s go to the bin directory under folder solr-6.6.0, and enter the following command in the command window:

solr create -c employeedata

The solr create command detects which mode Solr is running in, and then takes the appropriate action (either creating a Core or a Collection).

For more details and more options please enter this command:

solr create_core -help or solr create_collection -help

We can see the output below in the command window:

Creating new core 'employeedata' using command:
http://localhost:8983/solr/admin/cores?action=CREATE&name=employeedata&instanceDir=employeedata
{
  "responseHeader":{
    "status":0,
    "QTime":822},
  "core":"employeedata"}

In the Apache Solr admin server we will see that we have one core in the pulldown list at the end of left-side menu now. Clicking on the core name shows additional menus of information for our employee core, such as a Schema Browser, Config Files, Plugins & Statistics, and an ability to perform queries on indexed data.

HowtoSolr4

Modify the managed schema file

The employee data we use in our example looks like the following picture:HowtoSolr5

As you noticed in the Employee.xml, the fields employeeId, name, skills and comment should be added to the managed schema file, so that we can add the corresponding data from the document to those fields in the employeedata Core.

If you want, you can check the default schema fields and field types via curl. Make sure that you have already installed curl in your Linux or Windows system.

curl http://localhost:8983/solr/employeedata/schema
curl http://localhost:8983/solr/employeedata/schema/fields
curl http://localhost:8983/solr/employeedata/schema/fieldtypes

The managed schema file can be modified directly in the folder /solr-6.6.0/server/solr/employeedata/conf. Another way to change this file is to use Apache Solr admin server which is also quite simple. In this article, I will introduce a third method – using the REST API. Assume we have only remote access to our Solr server; so we can create a local file, which includes what we want to add and to change in the managed schema file and after Creation just send this local file via HTTP POST to the server. The local file looks like in the following picture:HowtoSolr6

For more details about properties and types in the field please use this link: https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties

Please then navigate to the employeeFieldConf.json file directory and enter this command in command window:

curl -X POST -H Content-type:application/json http://localhost:8983/solr/employeedata/schema/fields --data-binary @employeeFieldConf.json

 

After execution of this command, we can check the new fields using the admin server console:HowtoSolr7

As one can see the fields employeeid, name, skill and comment have been created successfully in the managed schema. Wait! We didn’t create a field called “id” in this case. But nonetheless it has obviously been created. Why?

Because every Core must have a unique key, the id is automatically created in the default managed schema file. Of course you can remove this field from the managed schema and change the unique key to another field.

Add documents

Everything is ready now, so we just need to add documents to our core now and index the data in those documents. Note that there are two main ways of adding data to Solr: HTTP GET and using a native client. For this small example we will use curl with HTTP GET. Please navigate to the Employee.xml file directory and enter this command into the command window:

curl -H Content-type:text/xml http://localhost:8983/solr/employeedata/update?commit=true --data-binary @Employee.xml

The content type used in this example is text/xml. For Json objects, we need to change the content type to application/json. Now we can check our core in admin server again, the new 4 documents have been already inserted und indexed.HowtoSolr8

Searching data

Now the documents have already been indexed and we can perform some queries on them. Apache Solr provides a REST API to access data and also provides different parameters to retrieve data. That means: for querying data we use HTTP GET by building a URL that includes the query parameters. For example: the following query shows all documents with field name equal “BYA”.

http://localhost:8983/solr/employeedata/select?q=name:BYA

The output will be shown as below:HowtoSolr9

The response contains two parts: a response header and the response in XML format. The response header shows some status and query information, and the results are shown in the response part. Please also notice how to create a query URL. This URL includes hostname, port, application name, core name and query parameter q (query: name=BYA). We can also use wildcards, the filter query parameter “fq” etc. Please try some of the commands below by yourself and see the results in the web browser. Are the running results just like what you were expecting?

http://localhost:8983/solr/employeedata/select?fq=employeeId:[0 TO 30]&q=*:*
http://localhost:8983/solr/employeedata/select?fq=employeeId:[0 TO 5999]&indent=on&q=skills:Java&wt=json
http://localhost:8983/solr/employeedata/select?fq=employeeId:[0 TO 5999]&indent=on&q=(skills:Java) AND (comment:big*)&wt=json

Updating documents

Please see this example below first:HowtoSolr10

As you noticed, the last employee in our document has no skill data, therefore we want to add a new “skills” field to this employee, and in addition we also want to modify the comment, as we want to get rid of the typo. Additionally, “unknown” is not a very good comment regarding an employee in the company. So how can we achieve that? We will use the same approach like adding documents to Apache Solr.

First we create an updating data file in our local system, the example will be shown as below:HowtoSolr11

We should talk about some aspects of this XML file. The most important point is that you can only update an existing file with a unique key; in this example the key is stored to the “id” field, as this field has been defined to contain the unique key.

Second, if you want to add a new field to a document, you must set attribute update=”add”, in opposite if you need modify an existing field, then you just set attribute update=”set”.

Once the file is created we can send it via HTTP GET to Apache Solr server by using this command under the EmployeeUpdate.xml file directory.

curl -H Content-type:text/xml http://localhost:8983/solr/employeedata/update?commit=true --data-binary @EmployeeUpdate.xml

The output will be shown as below:

HowtoSolr12

For more details, options and attributes by updating file please see this link: https://wiki.apache.org/solr/UpdateXmlMessages#add.2Freplace_documents

Delete document and core

The difference between deleting documents and updating documents is that you can remove documents by using a query clause. That means you can also delete a document with normal fields. The unique key is not necessary in this situation. Once again we use curl via HTTP GET to send our command. The first command is used to delete one document with name “BYA”, the second command is used to delete all documents in the employeedata Core.

curl http://localhost:8983/solr/employeedata/update?commit=true -H Content-type:text/xml --data-binary 'name:BYA'
curl http://localhost:8983/solr/employeedata/update?commit=true -H Content-type:text/xml --data-binary '*:*'

In Order to remove a Core, we can run a command with curl or we can delete the Core by using a script in the bin folder.

solr delete -c employeedata
curl http://localhost:8983/solr/admin/cores?action=UNLOAD&core=employeedata&deleteIndex=true&deleteDataDir=true&deleteInstanceDir=true

Scoring and Boosting

Assume that we found four documents as the result using the query. Which document should be placed on top of the result list? How does Solr rank documents and how can we tune the way Solr ranks and returns search results? To answer that question we need to understand the scoring system as well as the boosting system in Solr. Solr uses the Lucene Java search library at its core. Therefore we must understand the Lucene scoring system. But this is quite complex, so in this article I don’t want to go into more detail. If you want to know more about Lucene scoring system, please use this link: https://lucene.apache.org/core/2_9_4/scoring.html. There will be another article explaining the Lucene scoring system in detail coming soon.

The boosting system is more interesting as far as Apache Solr is concerned. In order to change the rank, the primary method of modifying document scores is by boosting. There are two ways to change the rank in Apache Solr: Index-time boost and Query-time boost.

  • Index-time: boosts are set during configuration our add document file. We can set boost at document level or at field level. We will see an example later.
  • Query-time: boosts are added in the query; you can give different weight with search words in the query.

Let us see an example for Index-time:HowtoSolr13

We have already seen this file before and used it to add documents to the Apache Solr server. The first and second red boxes are document level boosts settings. The whole document should be weighted in the system. The bigger the number, the higher the document appears on the rank. The last red box is the field level boost setting. That means: only this field’s “comment” is weighted for this document.

Query-time boosting:

An example: we want to know, which employees have skill c or skill Perl, the result will be shown as below:HowtoSolr14The important point is that we use the logical function “or” in this query URL. The result shows us that the employee “lala” is placed at top and then employee ADB comes in second place. But unfortunately we need a Perl skill employee immediately. That means we should search also skill “C” and skill “Perl”. But skill “Perl” is more important than skill “C”. How can we do that in Solr.

In query clauses we can use caret character ^ followed by a positive number as query boost to weight an important word.

http://localhost:8983/solr/employeedata/select?indent=on&q=skills:C or (skills:Perl)^10&wt=json

 

So run this URL again in a web browser. You will see the other result below:HowtoSolr15As we saw, the rank is just in the reverse. Solr doesn’t allow negative boots, so if you want to add some negative weight in your query clause, please use negative field install negative boots. For example: (-skills:Perl)^2.

Query-time boosting via function:

In addition Solr providers many functions to add boosting in query, if you want to know more about the list of functions and the usage of those functions, please use this link: https://wiki.apache.org/solr/FunctionQuery

Conclusion

That’s it.

This article describe only the basis of Solr and I wish this article can bring you nearly to the Apache Solr world and hope you have a good time during travelling this world.

.

Veröffentlicht unter Analytics & BigData | Verschlagwortet mit , , , , , | Kommentar hinterlassen

AWS News KW 28

.NET Core Support für AWS CodeStar und AWS Codebuild

AWS CodeStar, ist seit wenigen Monaten verfügbar und bietet Entwicklerteams die Möglichkeit schnell Anwendungen zu entwicklern, zu bauen und zu deployen.

Bisher Unterstützte AWS Codestar das Deployment für EC2, Elastic Beanstalk und Lambda für die Sprachen HTML5, Java, JavaScript, PHP, Python und Ruby.

Neben diesen Sprachen ist es jetzt auch möglich AWS Codestar für .NET Anwendungen zu verwenden. Damit öffnen sich die Vorteile von AWS Codestar und AWS Codebuild auch für .NET Entwickler.

CodeStarTemplates

Mehr Informationen hier.

Target Tracking Policies für EC2 Auto Scaling

Nutzt einen Auto Scaling Gruppe Target Tracking wird für eine bestimmte CloudWatch Metrik eine Zielgröße definiert. Auto Scaling sorgt dann dafür das diese Zielgröße erreicht wird, indem Instanzen hinzugefügt oder abgebaut werden. Folgende CloudWatch Metriken sind möglich:

  • Application Load Balancer Request Count Per Target (ebenfalls neu)
  • Average CPU Utilization
  • Average Network In/Out
  • eigene Metriken

Mehr Information hier.

 

 

Veröffentlicht unter Cloud & Infrastructure | Verschlagwortet mit , , , | Kommentar hinterlassen