Research Papers
-
Predicting subclinical ketosis in dairy cows using machine learning techniques - The diagnosis of subclinical ketosis in dairy cows based on blood ketone bodies is a challenging and costly procedure. To reduce complexity and cost, scientists are searching for tools based on results of milk performance assessment that would allow monitoring the risk of subclinical ketosis. The objective of the study was to develop and validate classification and regression models based on machine learning (ML) algorithms that would allow predicting subclinical ketosis in cows using data from test-day records. My role in this research was to support scientists in ML methods proper design and implementation (as described in Matherials and Methods).
-
Effectiveness of Different Analytical Techniques – Statistical and Machine Learning Models – in Particulate Matter Fine Particles Level Forecasting - According to the World Health Organization (WHO), air pollution is the biggest environmental risk to health in the European Union (EU) causing each year about 400 000 premature deaths, and hundreds of billions of euro in health-related external costs. The goal of this study is to describe efforts and scientific method for comparison effectiveness of different analytical techniques (basic, statistical and machine learning models) which can be used to predict fine particles of particulate matter (PM2.5) levels (as the most harmful pollutant). The insights and methods, like the best performing group of algorithms will be used in later iterations to build an interactive tool for PM2.5 level forecasting.
-
A Gentle Introduction to Computational Statistics - There are two perspectives we can look at statistical problems while solving them: analytical and computational. In this article, I will walk you through two exemplary problems (simulating outcomes and permutation testing) and compare how they can be solved using both methods.
Analytical Projects and Data Products
-
Bike Share Data (Python Console Apps) - Over the past decade, bicycle-sharing systems have been growing in number and popularity in cities across the world. Bicycle-sharing systems allow users to rent bicycles on a very short-term basis for a price. In this project, the main objective is to use data provided by a bike share system provider to uncover bike share usage patterns.
-
US Airline Delay Statistics (Tableau Dashboard) - The main objective of the project is to create a data visualization that tells a story to highlight trends and patterns (factors minimizing risks related to delayed or cancelled flights) in a dataset with information on the United States flight delays and performance.
-
Exploratory Data Analysis of Red Wine Quality Dataset (Analysis in R) - The goal of this exploratory data analysis (EDA) is to understand better what red wine features may have most impact on red wine good or bad quality (version including R code).
-
Exploring Countries of the World Dataset (Blog Post) - The world around us is fascinating and diverse. When I found out about the Countries of the World dataset I decided to take this opportunity to dig deeper inside and answer a few questions which were in my mind.
-
WeRateDogs Twitter Data Wrangling (APIs Handling) - The objectives of this study was performing data wrangling (gathering, assessing and cleaning) on provided three sources of data, storing, analyzing, and visualizing the wrangled data, and finally reporting on 1) data wrangling efforts and 2) data analyses and visualizations (as separate documents).
About Me
Krzysztof Satola: LinkedIn Profile and Github Repositories.