Archive

Latest

June 27, 2020

Mastering ElasticSearch Queries If You Have Only Worked With SQL Before

Elasticsearch is often the storage engine of choice for storing and querying full text data. But writing an ElasticSearch query is pretty different compared to querying a relational database in SQL. In this blogpost, you will learn some basics you need to understand before working with ElasticSearch. In the second part, you learn how to write queries in ElasticSearch.

June 24, 2020

How the Inverted Index and Scoring Work in ElasticSearch

In ElasticSearch querying fulltext fields is among the least resource intensive tasks and your query results are ordered putting the most relevant results on top. But how does this work?

June 7, 2020

Working with Complex Datatypes in Hive

The basic idea of complex datatypes is to store multiple values in a single column. So if you are working with a Hive database and you query a column, but then you notice “This value I need is trapped in a column among other values…” you just came across a complex a.k.a. nested datatype. There are three types: arrays, maps and structs. First, you have to understand, which types are present.

September 10, 2019

Plotting with Seaborn

Seaborn is a python library for creating plots. It is based on matplotlib and provides a high-level interface for drawing statistical graphics. Seaborn integrates nicely with pandas: It operates on DataFrames and arrays and does aggregations and semantic mapping automatically, which makes it a quick, convenient option for data visualization in your data projects. One you understand the basic concepts, you can create plots really easily without using stack overflow too much.

August 18, 2019

Mastering Data Preparation with Pandas: Subsetting, Filtering and Joining DataFrames

When I started working with pandas I noticed that there were so many ways how to subset, filter and join data with pandas. But I was lacking a systematic overview. How do the different approaches differ and when to use which? In this blogpost we’ll look at different ways for subsetting, filtering and combining DataFrames. Subsetting Data: Selecting subsets of rows and columns by labels and positions .