#Data Science
Data Science shows up on this blog whenever raw measurements start to demand structure, interpretation, and careful statistical treatment. Much of my day to day work involves extracting signals from complex data sets, so writing about these topics comes naturally. Posts tagged this way reflect practical encounters with messy experimental data, exploratory analysis, visualization, and model based inference. They range from small methodological notes to more extended discussions of analysis pipelines, always with an emphasis on transparency, reproducibility, and the craft of turning numbers into scientific arguments rather than polished dashboards.
There are currently 62 articles with this tag (newest first):
Distinguishing correlation from the coefficient of determination: Proper reporting of r and R²
I noticed that people sometimes report R² (‘R-squared’) instead of the Pearson correlation coeffi...
New teaching material: Functional imaging data analysis – From calcium imaging to network dynamics
We have just completed our new course, Functional Imaging Data Analysis: From Calcium Imaging to ...
Miniforge: The minimal, open solution for institutional Python environments
In response to recent licensing changes by Anaconda, Inc., Miniforge has emerged as the recommend...
New teaching material: Dimensionality reduction in neuroscience
We just completed a new two-day course on Dimensionality Reduction in Neuroscience, and I am plea...
PyTorch on Apple Silicon
Already some time ago, PyTorch became fully available for Apple Silicon. It’s no longer necessary...
Understanding Hebbian learning in Hopfield networks
Hopfield networks, a form of recurrent neural network (RNN), serve as a fundamental model for und...
Building a neural network from scratch using NumPy
Ever thought about building you own neural network from scratch by simply using NumPy? In this po...
Python’s version logos
Have you ever noticed that Python has introduced individual version logos starting with version 3...
Conditional GANs
I was wondering whether it would be possible to let GANs generate samples conditioned on a specif...
Eliminating the middleman: Direct Wasserstein distance computation in WGANs without discriminator
We explore an alternative approach to implementing WGANs. Contrasting from the standard implemen...
Wasserstein GANs
We apply the Wasserstein distance to Generative Adversarial Networks (GANs) to train them more ef...
Probability distance metrics in machine learning
Probabilistic distance metrics play a crucial role in a broad range of machine learning tasks, in...
Comparing Wasserstein distance, sliced Wasserstein distance, and L2 norm
In machine learning, especially when dealing with probability distributions or deep generative mo...
Approximating the Wasserstein distance with cumulative distribution functions
In the previous two posts, we’ve discussed the mathematical details of the Wasserstein distance, ...
Wasserstein distance via entropy regularization (Sinkhorn algorithm)
Calculating the Wasserstein distance can be computational costly when using linear programming. T...
Wasserstein distance and optimal transport
The Wasserstein distance, also known as the Earth Mover’s Distance (EMD), provides a robust and i...
Visualizing Occam’s Razor through machine learning
Here, we illustrate the concept of Occam’s Razor, a principle advocating for simplicity, by exami...
Mamba vs. Conda: Unleashing lightning-fast Python package installations
If you’ve ever experienced the frustration of waiting for ages while installing Python packages w...
Assessing animal behavior with machine learning: New DeepLabCut tutorial
I have added a hands-on tutorial to the Assessing Animal Behavior lecture. The tutorial covers th...
Assessing animal behavior with machine learning
High-throughput and multi-modal behavior experiments, coupled with machine learning analysis, unl...
Bioimage analysis with Napari
I’ve added new teaching material on using the free and open-source software (FOSS) Napari for bio...
Using random forests for pixel classification
Beyond traditional classification problems, random forests have proven their effectiveness in pix...
Decision Trees vs. Random Forests for classification and regression: A comparison
Decision trees and random forests are popular machine learning algorithms that are widely used fo...
Image denoising techniques: A comparison of PCA, kernel PCA, autoencoder, and CNN
In this post, we explore the performance of PCA, Kernel PCA, denoising autoencoder, and CNN for i...
Using Autoencoders to reveal hidden structures in high-dimensional data
In this Python tutorial, we explore the application of Autoencoders for dimensionality reduction,...
Unlocking hidden patterns with Factor Analysis
In this Python tutorial, we dive into Factor Analysis, a powerful statistical method used to unco...
Untangling complexity: Harnessing PCA for data dimensionality reduction
This tutorial explores the use of Principal Component Analysis (PCA), a powerful tool for reducin...
t-SNE and PCA: Two powerful tools for data exploration
Dimensionality reduction techniques play a vital role in both data exploration and visualization....
Understanding L1 and L2 regularization in machine learning
Regularization techniques play a vital role in preventing overfitting and enhancing the generaliz...
Understanding gradient descent in machine learning
Gradient descent is a fundamental optimization algorithm widely used in machine learning for find...
Loading and saving files in Google Colab
Enable I/O support in your notebooks running in Google Colab with just a few additional commands.
Mutual information and its relationship to information entropy
Mutual information is an essential measure in information theory that quantifies the statistical ...
Information entropy
A fundamental concept that plays a pivotal role in quantifying the uncertainty or randomness of a...
Understanding entropy
In physics, entropy is a fundamental concept that plays a crucial role in understanding the behav...
Bio-image registration with Python
Which method works best for which registration problem? In this tutorial we compare different met...
How to run PyTorch on the M1 Mac GPU
As for TensorFlow, it takes only a few steps to enable a Mac with M1 chip (Apple silicon) for mac...
How to run TensorFlow on the M1 Mac GPU
In just a few steps you can enable a Mac with M1 chip (Apple silicon) for machine learning tasks ...
Is there a difference between miniconda and miniforge?
Simply said: not really. Miniconda is the company driven minimal conda installer, while miniforge...
Hacks and extensions to improve your coding with Visual Studio Code
This curated list contains useful hacks and extensions to improve the overall coding performance ...
Setting up Visual Studio Code for Python
In just a few steps you can turn Visual Studio Code (VS Code) into a powerful Python editor for b...
Enable interactive plots and other plot modes in Jupyter notebooks
Learn how to enable interactive, static and stand-alone window plots in Jupyter notebooks with th...
Enable code folding in JupyterLab
Learn how to enable code folding in JupyterLab for both, Jupyter Notebooks and pure Python scripts.
How to create and apply a requirements.txt file in Python
Learn how to install Python packages with a requirements.txt file and how to create one yourself.
Virtual environments with venv
In addition to conda’s create command, Python’s built-in venv command offers another way for cre...
Using pip to install Python packages
pip is another package installer for Python. Learn how to use it for installing and managing Pyth...
How to install and run Python code from GitHub
Learn how to install code from GitHub, that is, e.g., not (yet) available via conda or pip.
A minimal Python installation with miniconda
Learn how to install miniconda to have a quick and minimal Python installation on any operating s...
Stable installation of Napari on a M1 Mac
In case you’re having problems installing Napari on your M1 Mac, try to install it from conda ins...
Open Zarr files in Fiji
Both Zarr and OME-ZARR files are supported in Fiji. Here’s how to get it working.
Using Zarr for images – The OME-ZARR standard
As for any other NumPy array, we can use the Zarr file format to store image files. In this post ...
Zarr – or: How to efficiently save NumPy arrays
What is Zarr and why is it probably the most suitable file format for saving NumPy arrays?
How to read patch clamp recordings in WaveMetrics IGOR binary files (ibw) in Python
This is a mini tutorial on how to read patch clamp recordings in WaveMetrics IGOR binary files (*...
How to add statistical annotations to matplotlib plots
This mini tutorial shows, how to add statistical annotations to matplotlib plots with just a few ...
Make matplotlib plots look more appealing with just a few extra commands
Learn how to enhance matplotlib plots with just a few hacks.
Variable explorer in Jupyter notebooks
Extend your Jupyter environment with Notebook Extensions and enable, e.g., the option to explore ...
Opening a Jupyter notebook from GitHub in Binder: A step-by-step guide
Opening a Jupyter notebook from GitHub in Binder simplifies access to shared code and facilitates...
New Teaching Material: Python Cheat Sheets
I’ve started a collection of various Python cheat sheets that contain some useful and commonly us...
New Teaching Material: Statistical data analysis and basic time series analysis with Python
I’ve added two new tutorials in the teaching section on statistical data analysis and basic time ...
New Teaching Material: Analyzing IGOR binary files of patch clamp recordings
I’ve added a new tutorial in the teaching section on how to read and process IGOR binary files (i...
The Lotka-Volterra equations: Modeling predator-prey dynamics
The Lotka-Volterra system, also known as the predator-prey equations, is a mathematical model tha...
Interactive COVID-19 data exploration with Jupyter notebooks
Amidst the ongoing challenges of the COVID-19 pandemic, I have written a Jupyter notebook that fa...
The SIR model: A mathematical approach to epidemic dynamics
In the wake of the COVID-19 pandemic, epidemiological models have garnered significant attention ...