Daany – .NET DAta ANalYtics library
Introduction
Daany
is .NET data analytics library written in C# and it supposed to be a tool
for data preparation, feature engineering and other kinds of data
transformations prior to creating ml-ready data set. It is .NET Core based
library with ability to run on Windows Linux based distribution and Mac. It is
based on .NET Standard 2.1.
Besides data analysis, the library implements a set of statistics or data
science features e.g. time series decompositions, optimization performance
parameters and similar.
Currently Daany
project consists of four main
components:
Daany.DataFrame
,Daany.Stats
,Daany.MathStuff
andDaany.DataFrame.Ext
The main Daany
component is Daany.DataFrame
- a data frame implementation for
data analysis. It is much like Pandas
but the component is not going to follow
pandas implementation. It is suitable for doing data exploration and preparation
with C# Jupyter Notebook
. In order to create or load data into data frame it
doesn’t require any predefined class type. In order to defined relevant value type of each column all data are parsed internally during data frame creation. The Daany.DataFrame
implements set of powerful features for data manipulation, handling missing values, calculated columns, merging two or more data frames into one, and similar. It is handy for extracting its rows or columns as series of elements and put into the chart to visualizing the data.
Daany.Stat
is a collection of statistics features e.g. time series
decompositions, optimization, performance parameters and similar.
Daany.Math
is a component within data frame with implementation of od matrix and related
linear algebra capabilities. It also contains some implementation of other great open source projects. The component is not going to be separate NuGet package.
Daany.DataFrame.Ext
contains extensions for Daany.DataFrame
component, but they are related to other projects mostly to ML.NET. The Daany.DataFrame
should not be dependent on ML.NET
and other libraries. So, any future data frame feature which depends on something other than Daany.Math
, should be placed in Daany.Ext
.
The project is developed as a need to have a set of data transformation features
in one library while I am working with machine learning. So, I thought it might
help to others. Currently, the library has pretty much data transformation
features and might be your number one data analytics library on .NET
platform. Collaboration to the project is also welcome.
How to start with Daany
Daany
is 100% .NET Core component and can be run on any platform .NET Core supports, from the Windows x86/x64 to Mac or Linux based OS. It can be used by Visual Studio or Visual Studio Code. It consisted of 3 NuGet packages, so
the easiest way to start with it is to install the packages in your .NET
application. Within Visual Studio create or open your .NET application and open
NuGet packages window. Type Daany
in the browse edit box and hit enter. You can
find four packages starting with Daany. You have few options to install the
packages.
-
Install
Daany.DataFrame
– only. Use this option if you want only data
analysis by using data frame. Once you click Install button, Daany.DataFrame
and Daany.Math will be installed into your project app. -
Install
Daany.Stat
package. This package already containsDataFrame
, as well as time series decomposition and related statistics features.
Once you install the packages, you can start developing your app using Daany
packages.
Using Daany
as assembly reference
Since Daany
has no dependency to other libraries you can copy three dlls and add them as reference to your project.
In order to do so clone the project from http://github.com/bhrnjica/daany,build it and copy Daany.DataFrame.dll
, Daany.Math.dll
and Daany.Stat.dll
to your project as assembly references. Whole project is just 270 KB.
Using Daany
with .NET Jupyter Notebook
Daany library is ideal with .NET Jupyter Notebook, and some of the great notebooks are implemented already, and can be viewed at http://github.com/bhrnjica/notebooks. The GitHub project contains the code necessary to run the notebooks in Binder, a Jupyter Virtual Environment, and try Daany without any local installation. So the first recommendation is to try Daany with already implemented notebooks using Binder.com.
Namespaces in Daany
Daany
project contains several namespaces for separating different
implementation. The following list contains relevant namespaces:
using Daany
– data frame and related code implementation,using Daany.Ext
– data frame extensions, used with dependency on third party
library,using Daany.MathStuff
– math related stuff implemented in Daany,using Daany.Optimizers
– set of optimizers like SGD,using Daany.Stat
– set of statistics implementations in the project.
That's all for this post. Next blog posts will show more exciting implementation using Daany.