Arknights is a tower defense mobile game that I have been playing for over a year now. The fans have created multiple tools and portals to aid in each other’s progress. Inspired by them, I decided to start working on a Data Science personal project for the game.
The result I am sharing today is the Python script to web scrape a dataset of character stats. This won’t be a tutorial, rather, it will be a show and tell of sorts where I explain my thought process and the progress of my code to arrive at the final solution.
In today’s article I am going to share a solution on how to create an array of dates to loop through in an Azure Data Factory pipeline. An array like this can be useful when you need to run a set of activities for a specific date, and you need that period to be dynamic. For instance, in one pipeline run you wanted the last week, but in the next you need the whole month.
Aside from an Azure subscription and a Data Factory resource, the things needed are:
Sometimes you have multiple columns of measures for a single purpose, yet you only want to keep the one that performs according to your needs.
In this demo we’ll analyse a synthetic clustering model output dataset. The trick is that we have columns with the distance to the centre of each cluster, but not a column with the cluster assignment itself. In other words, it becomes hard to further analyse the model predictions.
While we’ve made great strides in analysing structured and semi-structured data, unstructured data is still running behind. Of course, this third type presents challenges not found in the other two, but we are steadily finding more options and tools for that job.
One of those tools is, as expected, Artificial Intelligence. Microsoft Azure has some interesting options for this job, namely Cognitive Services. As per their description,
“Cognitive Services brings AI within reach of every developer — without requiring machine-learning expertise”.
Today I bring you a guide on how this description is realized in practice by showing you how to…
I’ve been giving some serious thought as to how I learned Power BI. I do Power BI development for a living, but my learning experience was so all over the place that I’ve been trying to come up with a focused path for other people.
This is my attempt at designing that path. There’s not really a fixed order to these resources, you can go back and forth based on your level of comfort and curiosity.
Note for these tutorials and/or learning resources: don’t be thrown off by having a different UI in your Power BI version or having more…
My career choice is Data Science. However, 3D modeling always appealed had a special place in my heart, probably because video games play a big role role in my free time (pun intended).
I’ve been slowly learning Blender and 3D Modeling since last summer, so today I decided to make a short compilation of the best tutorials I’ve completed. Those that were jam packed with helpful beginner tips and because of the timing or their sheer quality stuck me with the most.
Don’t get me wrong, I’m still a Blender noob, but at this point I’ve gone through several tutorials…
In the previous articles we’ve created four different Jupyter Notebooks that achieve different data transformations and visualizations of the 2020 Stack Overflow Developer Survey data. Today we are moving all of that to the cloud, more specifically to an Azure DataBricks workspace.
As prerequisites, you need an Azure account as well as a valid subscription, and of course an Azure DataBricks (ADB) workspace (check this documentation page on how to do it). After you’ve created the workspace, I will show you how to set up a cluster to run your computations on, and upload the notebooks and the data.
Today we reach the fourth part of this series, the last part about writing code. The fifth and last part will be about moving our notebooks to the cloud, to an Azure DataBricks workspace.
But before that, we need to analyse the programming languages used by the respondents of the 2020 Stack Overflow Developer survey. This column comes in a bad format, as the choices of each developer are put in the same column, separated by a semicolon (;). …
So far in this series we’ve worked with numerical data. Today we’ll analyse the education of respondents to the 2020 Stack Overflow Developer Survey by finding out which are the most frequent education levels.
Thankfully this is a straightforward demo. We just need to map the original options to new values (e.g. “Master’s degree (M.A., M.S., M.Eng., MBA, etc.)” to simply “Master’s degree”), count their frequencies and plot the bar chart.
As usual, here are some handy links to navigate the contents of this series:
In the first part of this series we went through some exploratory data analysis of ages to filter out the bad data, and at the end plot a bar chart with the age frequencies. Today we are working on the annual compensations of the 2020 Stack Overflow Developer Survey results.
This will involve binning the values so that we can plot them in a histogram at the end. For that, we need to create bin labels (to improve the visualization) and the bin intervals. We’ll make plenty use of the wonderful list comprehension feature of Python!
While Plotly can bin…
I write about data science to help other people who might come across the same problems