Sometimes you have multiple columns of measures for a single purpose, yet you only want to keep the one that performs according to your needs.

(source)

In this demo we’ll analyse a synthetic clustering model output dataset. The trick is that we have columns with the distance to the centre of each cluster, but not a column with the cluster assignment itself. In other words, it becomes hard to further analyse the model predictions.


While we’ve made great strides in analysing structured and semi-structured data, unstructured data is still running behind. Of course, this third type presents challenges not found in the other two, but we are steadily finding more options and tools for that job.

One of those tools is, as expected, Artificial Intelligence. Microsoft Azure has some interesting options for this job, namely Cognitive Services. As per their description,

“Cognitive Services brings AI within reach of every developer — without requiring machine-learning expertise”.

Today I bring you a guide on how this description is realized in practice by showing you how to…


Cover image
Cover image
(source)

I’ve been giving some serious thought as to how I learned Power BI. I do Power BI development for a living, but my learning experience was so all over the place that I’ve been trying to come up with a focused path for other people.

This is my attempt at designing that path. There’s not really a fixed order to these resources, you can go back and forth based on your level of comfort and curiosity.

Note for these tutorials and/or learning resources: don’t be thrown off by having a different UI in your Power BI version or having more…


My career choice is Data Science. However, 3D modeling always appealed had a special place in my heart, probably because video games play a big role role in my free time (pun intended).

I’ve been slowly learning Blender and 3D Modeling since last summer, so today I decided to make a short compilation of the best tutorials I’ve completed. Those that were jam packed with helpful beginner tips and because of the timing or their sheer quality stuck me with the most.

Don’t get me wrong, I’m still a Blender noob, but at this point I’ve gone through several tutorials…


Azure Databricks icon
Azure Databricks icon
Azure Databricks icon

In the previous articles we’ve created four different Jupyter Notebooks that achieve different data transformations and visualizations of the 2020 Stack Overflow Developer Survey data. Today we are moving all of that to the cloud, more specifically to an Azure DataBricks workspace.

As prerequisites, you need an Azure account as well as a valid subscription, and of course an Azure DataBricks (ADB) workspace (check this documentation page on how to do it). After you’ve created the workspace, I will show you how to set up a cluster to run your computations on, and upload the notebooks and the data.

Of…


Cover image
Cover image
Cover image (source)

Today we reach the fourth part of this series, the last part about writing code. The fifth and last part will be about moving our notebooks to the cloud, to an Azure DataBricks workspace.

But before that, we need to analyse the programming languages used by the respondents of the 2020 Stack Overflow Developer survey. This column comes in a bad format, as the choices of each developer are put in the same column, separated by a semicolon (;). …


Cover image
Cover image
Cover image (source)

So far in this series we’ve worked with numerical data. Today we’ll analyse the education of respondents to the 2020 Stack Overflow Developer Survey by finding out which are the most frequent education levels.

Thankfully this is a straightforward demo. We just need to map the original options to new values (e.g. “Master’s degree (M.A., M.S., M.Eng., MBA, etc.)” to simply “Master’s degree”), count their frequencies and plot the bar chart.

As usual, here are some handy links to navigate the contents of this series:


Cover image
Cover image
Cover image (source)

In the first part of this series we went through some exploratory data analysis of ages to filter out the bad data, and at the end plot a bar chart with the age frequencies. Today we are working on the annual compensations of the 2020 Stack Overflow Developer Survey results.

This will involve binning the values so that we can plot them in a histogram at the end. For that, we need to create bin labels (to improve the visualization) and the bin intervals. We’ll make plenty use of the wonderful list comprehension feature of Python!

While Plotly can bin…


Cover image
Cover image
Cover image (source)

After posting a handful of separate articles on data analysis with Python, I’ve decided to share some of the work I did on previous personal projects in the form of a proper series.

This “Python Data Analysis” series will consist of five articles tackling different data problems using the 2020 Stack Overflow Developer survey results dataset. I will show you how to use pandas to overcome issues with numeric and categorical data to create nice visualizations with Plotly (Express) at the end.

Although I only show a Python script here, each article has its own Jupyter notebook with the same…


Cover image
Cover image
(source)

pandas is a wonderful library to work with data in Python. If you’re accustomed to tabular data, then you will feel right at home with this pandas, better yet, while writing Python code. I’ve started working with this library a couple years ago, but I only started using it seriously last year. In this period, I’ve come across many useful functions and so today I will briefly show-off five that have stood out to me for their applications.

Categorical

Sometimes there is a need for a custom sorting order. If you try to use the sort_values function in a column with…

José Fernando Costa

I write technical articles about data analysis and other things that catch my attention

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store