Things to consider when creating your own learning journey

Introduction

Photo by Dmitry Ratushny on Unsplash

Data engineering is an essential field in the data industry. There are a lot of needed activities dealing with data so that we can gain a data-driven decision. The examples could be

  • Move data from one place to another.
  • Consume by various systems ranging from in-house analytics through the 3rd parties outside our organization.
  • Answer the business questions on time.
  • Ensure the data quality from front-end application to back-end analytics.
  • Govern the data accessibility policy.

The demand for data engineering continues growing every day.

As a data enthusiast in this field, I thought this is a valuable skill that anyone…


What you need to know when building or modifying the docker image for your use cases

Photo by Drew Collins on Unsplash

Introduction

Recently, I have been working on building a data pipeline to transfer the data from a relational database management system (RDBMS) to the Hadoop ecosystem (HDFS). With limited time resources, I have to finish the data pipeline within seven days from scratch. Also, I got the suggestion that it should complete the entire thing within three days.

At that point in time, my knowledge about the Hadoop ecosystem configuration and docker was nearly nothing. I know the concept of what a distributed system is, but to configure it by myself was almost impossible. It is easier to scare from what…


A guide to saving your time from the most tedious task in the world.

Photo by Vindemia Winery on Unsplash

Introduction

Reporting is the foundation of any business. In daily life, you somehow have to ingest the new data from a report to decide where to go next every day. The report can come in various formats such as Microsoft Excel, Web application, or exporting from an enterprise resource planning system (ERP).

I have recently got a request to build a dashboard that replicates the business number in the crafted report. The finance team manually created this report monthly. The most tedious process is to export the source files from the SAP system and manually place it in excel. …


3 tips for improving your data frame/ graph format

Photo by Markus Spiske on Unsplash

Your daily life data analysis

As a data scientist/analyst, your job is to produce a report that contained many insights for business decisions. A report can be made by several useful tools such as Microsoft Excel, SAP, or customized with the programing language such as SAS, R, or Python. The result can be sent through internal email to a stakeholder or publish through the centralized dashboard.

Like everyone else, I am a data analyst who uses python for making a report or presentation in daily life. My usual assignment is to make an ad-hoc analysis within 2–3 hours to present to the management team.

To…


A mindset I use to get back my free time

Photo by Christin Hume on Unsplash

We all have limited time in our life.

24 hours a day is relatively short if you have many things to achieve. We all dream about a productive life to get whatever we want to be done with ease.

However, life is not that easy and sends so many distractions to you, especially in 2020.

We all have social media, entertainment platform, online publications, etc., in our hands.

We can spend a day on it without bored. This is quite a difference compared to ten to twenty years ago.

The more time we spend on those distractions, the less number…


A tutorial to make your machine learning pipeline has more visibility

Photo by Alejandro Piñero Amerio on Unsplash

Introduction

Machine learning pipeline is an essential part of data application. We build it to transform the raw data into an insightful prediction. The pipeline contains many steps such as data ingestion, data preprocessing, feature engineering, model fitting, and performance evaluation.

When data scientists start developing the ML pipeline, they try to build the whole pipeline fast and re-iterate the process by changing some hyper-parameter to get the best result. There are many hyper-parameters to tweak in this process.

It would be best if we can track the variation of those hyper-parameters. We will gain a deeper understanding of our ML…


REVIEW

Learning new trends from watching Korean Netflix’s series.

photo by tvn

Spoiler alert: this article may contain information about this drama. Please feel free to skip it first if you have not watched it yet. But, if you don’t mind, let’s dive in!

Recently, I have watched the Netflix series called STARTUP. It’s a Korean drama that is on-air every SAT and SUN at 9 PM. The story is about a group of people who dream of establishing a startup business on their own.

Seem straightforward and not interested, right?

But, the exciting part is that the main character of this series is…


OPINION

To strengthen your company data foundation, let’s do it

Photo by Debby Hudson on Unsplash

Data analytics, science, and engineering have grown much popularity in the last few years. It creates a new standard for the industry. Every company needs to invest or establish a data office within their organization.

It becomes standard in 2020 that you can have a prediction model for marketing leads, improving your check-in method with facial recognition., or looking at the elegant dashboard for making a business decision.

Exceptional use cases always come first to build the momentum of the analytics trend. Executives want to see a result before investing a massive amount of funds into a new direction.

The…


A snippet code for checking your data quality

Image by the author: Thanks background from Photo by ThisisEngineering RAEng on Unsplash

I point out the importance and data quality issues in the previous article.

The quicker you realize the problem with your data, the better you can deliver a valid conclusion to drive the business.

When you have limited time to do the analysis, I hope this tutorial helps you like a checklist for ensuring the data condition before presenting to the audience.

Today I will show you the code snippet for checking the data condition. The topics will cover units of analysis, missing values, duplicated records, Is your data makes sense, and truth changing over time.

The tutorial will be…

Pathairush Seeda

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store