Cover image
Cover image
(source)

pandas is a wonderful library to work with data in Python. If you’re accustomed to tabular data, then you will feel right at home with this pandas, better yet, while writing Python code. I’ve started working with this library a couple years ago, but I only started using it seriously last year. In this period, I’ve come across many useful functions and so today I will briefly show-off five that have stood out to me for their applications.

Categorical

Sometimes there is a need for a custom sorting order. If you try to use the sort_values function in a column with month names, the result will use alphabetical order, not the natural order. …


Cover image
Cover image
(source)

As with many other things, Python is pretty good for programmatic image editing. This is in large part thanks to the Pillow library. Creating an image from scratch to paste a couple layers on top, or apply some filters is easily accomplished with a few lines of code.

Today, I will show you a short script that creates custom-shape masks. We’ll load a Santa hat without background and add one to it, along with a shadow in less than 20 lines of code!

For the original Santa hat image, please find it here.

If you’re already familiar with Pillow, then this code will look simple to you, and even if this is your first time coming across this library, I think the comments in the code will suffice. But still, let me highlight the most important bits. …


Cover image
Cover image
Cover image

Among the many powerful connectors available in Power BI, the Web connector is a great option if you wish to integrate web scraping in your reports. While it does have some limitations, such as only scraping HTML tables, it is nonetheless a strong addition to your ETL capabilities in Power BI.

In this tutorial, I will walk you through the scraping of country flag images from here. We will start by connecting to the website to extract the tables of countries, and then write some M code in Power Query to create the URLs for each country flag image (yes, we will get our hands dirty writing M). …


Cover image
Cover image
(source)

When working with pandas dataframes, sometimes there is a need to sort data in a column by a specific order. For example, you may want to sort a Dataframe by its column of months so that they are properly sorted for a time series visualization. The problem is, a normal sort will get your months sorted alphabetically, not in the natural January to December order.

It’s in cases like these that the Categorical function can help. Just like you can transform a column to a numeric type, you can also transform it to the category type to be treated as a proper categorical column. More than being treated as a column of categorical data, at the moment of casting you can specify a list or an array containing the unique categories (i.e., …


As I was working on freeCodeCamp’s Data Analysis with Python certification, I came across a tricky Matplotlib visualization: a grouped bar chart. I’ve been making my way through the projects, but the guidance is minimal. This is good because it makes you put in the work to arrive at the desired solution, but it is awful if you don’t have much experience with Matplotlib, pandas and Numpy, or even if you’re just having difficulties with the current exercise.

So, I’m writing this article to share my solution on how to create the grouped bar chart from the “Page View Time Series Visualizer” project. I had a hard time understanding how to create this visualization in Matplotlib so I hope this article is enlightening for your data analysis projects. …


Cover image
Cover image
(source)

Regular Expressions (ReGex) patterns look like nonsense, and yet they are a powerful tool for extracting information from text. A bunch of seemingly random punctuation combined with a lot of parentheses and a couple letters here and there actually manage to find the information you’re looking for.

So, in the spirit of practice, I’m writing this article to show you how to extract expenses’ cost and name from a text file. It won’t be the easiest example possible, but we will go over some different ReGex features to arrive at the solution. …


Cover picture
Cover picture
(source)

Imagine you have a spreadsheet of employee data. You have their name, their age, their contact information, etc. This data changes over time as employees enter and/or leave the company, or simply update their information.

In today’s tutorial, I will show you how to update a pandas DataFrame in this professional context to keep track of the employee status: if they are still present in the new data(frame), then we just update their information; otherwise we assume they have left the company. Brand new employees will simply be added to our dataset. We’ll assume we get a new listing of our employees in regular intervals and all current employees are included in that listing. …


Recently I came across Streamlit and I’m enjoying my time developing with it. As described in the documentation,

Streamlit’s open-source app framework is the easiest way for data scientists and machine learning engineers to create beautiful, performant apps in only a few hours

And it’s true. The API is easy to learn but powerful. For instance, to create a basic web app that receives some text input and writes it back to the page

Basic Streamlit web app
Basic Streamlit web app
Basic Streamlit web app

You only need three lines of Python code

import streamlit as st
my_text_input = st.text_input("Type to your heart's content", "Hello World")
st.write(f"You entered: {my_text_input}")

st.text_input creates a text input field. The first argument is the label displayed above the field, and the second argument is the default/placeholder value. This function returns the value entered in the field as a string, which is saved as the my_text_input variable. In the next line, st.write simply writes the string to the page. At the end, enter the streamlit run your_script.py in the command line to open the app in your browser. …


PIL is a great package to create and edit images in Python. However, one problem it has is in calculating text size. So today I’m going to show you a short Python script used to draw properly centered text in images created with PIL.

The problem

Unfortunately, the textsize method of the drawing interface does not return the proper text size of a string.

For instance, with this sample code

img = Image.new("RGB", (width, height), color="white")
draw_interface = ImageDraw.Draw(img)
size_width, size_height = draw_interface.textsize("some string", font)

size_width and size_height would return incorrect values.

The solution

So, the solution is to use different calculations for the the text size. Following the solution posted in this Stack Overflow thread (with some changes), the function below is what I ended up with. …


Banner image
Banner image
(source)

Introduction

Today I am going to show you how to create and modify a PostgreSQL database in Python, with the help of the psycopg2 library.

Unlike SQLAlchemy that generates SQL queries while mapping the database schema to Python objects, psycopg2 takes your hand-crafted SQL queries and executes them against the database. In other words, SQLAlchemy is an ORM (Object-Relational Mapper) and psycopg2 is a database driver for PostgreSQL.

Create the database and its tables

We’ll be creating a dead simple housing database that consists of two tables: “person” and “house”.

About

José Fernando Costa

I write technical articles about data analysis and other things that catch my attention

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store