Archive

Latest

Mastering ElasticSearch Queries If You Have Only Worked With SQL Before

Elasticsearch is often the storage engine of choice for storing and querying full text data. But writing an ElasticSearch query is pretty different compared to querying a relational database in SQL. In this blogpost, you will learn some basics you need to understand before working with ElasticSearch. In the second part, you learn how to write queries in ElasticSearch.

Working with Complex Datatypes in Hive

The basic idea of complex datatypes is to store multiple values in a single column. So if you are working with a Hive database and you query a column, but then you notice “This value I need is trapped in a column among other values…” you just came across a complex a.k.a. nested datatype. There are three types: arrays, maps and structs. First, you have to understand, which types are present.

Plotting with Seaborn

Seaborn is a python library for creating plots. It is based on matplotlib and provides a high-level interface for drawing statistical graphics. Seaborn integrates nicely with pandas: It operates on DataFrames and arrays and does aggregations and semantic mapping automatically, which makes it a quick, convenient option for data visualization in your data projects. One you understand the basic concepts, you can create plots really easily without using stack overflow too much.

Mastering Data Preparation with Pandas: Subsetting, Filtering and Joining DataFrames

When I started working with pandas I noticed that there were so many ways how to subset, filter and join data with pandas. But I was lacking a systematic overview. How do the different approaches differ and when to use which? In this blogpost we’ll look at different ways for subsetting, filtering and combining DataFrames. Subsetting Data: Selecting subsets of rows and columns by labels and positions .