Data Workshop is the platform for creating complex projects to implement Machine Learning, Artificial Intelligence, custom APIs, ETL or custom Data-pipes and much more. It is a work area to create specialized solutions to solve specific problems in every organization that uses Data.
As DataKubes is a comprehensive data management platform in an organization, the solutions that can be created on the platform are limitless. With complete tools for development, visualization, storage, processing, integration, and much more, the data team will implement solutions of all kinds in DataKubes.
Projects are work areas within the DataKubes Data Workshop (DK-DW). Different types of objects are grouped in each project that falls into the following classification within a flow of a data solution.
|Data Pipes||Use DataKubes DataPipes to enter your model data; you can extract data from MySQL, PostgreeSQL, CSV, JSON, and ODBC to your repository tables.|
|Extract, transform, prepare||Here are the objects that allow you to extract data from complex sources or not supported by DataPipes, or objects that transform or (and) prepare the existing data in the repository.|
Train, Test, Predict
|Create objects that allow you to train, test, and predict future results using algorithms so Machine Learning libraries can use the tables in the project repository.|
|Alerts Rules||These objects consist of development objects for the complex analysis of data rules to find matches and notify them and the Alert Rules that are part of the project.|
|Visualization||Objects that use the DataKubes analytics engine to visualize the results of data models and processes applied to data in repositories.|
|Web Application / Application||Create applications that run persistent containers to collect, manage, or deliver complete applications to your organization's end-users.|
|Api Tokens||Once you have the models and data ready, you can share them to access the DataKubes APIs using API Tokens. The Token allows access to carry out administrative operations of the project or share data already processed in tables or Kubes in the repository and project.|
To create a project in the Data Workshop, you only need to use the "+" icon next to the search bar, as shown in the following image:
It will open the following project creation screen, which will request the necessary fields to identify it.
Once the project is created, you will be able to complete the different necessary objects for it.
Currently, DataKubes Data Workshop supports development environments according to the need or necessary specialty, these are:
This environment allows you to develop fast applications or processes that can be used for APIs, Web Applications, ETL, and more. Currently, the installed modules are:
Core, curl, date, ftp, gd, hash, iconv, imagick, imap, json, libxml, mbstring, mcrypt, mysqli, openssl, PDO pdo_sqlite, redis, SimpleXML, sqlite3, zip, zlib, xml, xmlreader, xmlwriter
This environment is preferred for creating Machine Learning models and advanced analysis with its robust library of models and algorithms to facilitate predictions and AI in your data objects. Currently, the preloaded modules are:
matplotlib, med2image, nibabel, pillow, pydicom, TensorFlow, TensorFlow-GPU, Keras, mysqlclient, pyodbc, torch, torchvision, NumPy, pandas, scipy, nltk, orjson.
This environment allows you to develop web applications or consumer APIs that persist as servers. Currently, the installed modules are:
Core, curl, date, ftp, gd, hash, iconv, imagick, imap, json, libxml, mbstring, mcrypt, mysqli, openssl, PDO pdo_sqlite, redis, SimpleXML, sqlite3, zip, zlib, xml, xmlreader, xmlwriter, apache , mod_ssl, letsencrypt
The selection of the environment is made when creating objects in the project, as shown in the following image:
Creating a complete AI solution requires many elements and processes that compose it. Organizing each project better allows creating objects that, in turn, contain complex processes that, when grouped, result in a solution to a specific need.
The types of Objects fall into two categories:
- Processing Objects
- Display Objects
Processing objects allows executing advanced codes that read, process, apply ML models, and stores the results to the project's data repository.
The visualization objects are Kubes or DataKubes Cubes used to visualize straightforwardly using the embedded DataKubes visualization engine. Each Kube that is created in a project is available to users with permissions from Insights .
These objects allow you to create applications that run permanently to serve custom applications to users or create integration APIs that are waiting to be consumed at any time. These objects are allowed to be active indefinitely, unlike Processing Objects that only run and return execution results every time they are activated; they do not remain active.
DataKubes uses Dockers to execute the processing objects and those of Daemon or applications, allowing them to perform independently and safely. Currently, DataKubes uses two resource environments to run these containers:
DataKubes Shared Container Resources, which is shared computing resources to all users who require it, is the most cost-effective and controlled way to implement complete Data solutions and only pay for the use in said cluster.
DataKubes Dedicated Container Resources this environment is a dedicated cluster with computing resources assigned to an organization. As such, it has unlimited consumption of the allocated resources, and it allows total growth. This model has no cost for use but the allocated dedicated resources.
Every DataKubes account has access to the DK Shared Container Cluster, and it is charged for its consumption as used. *
Dedicated Container Resources Access
To access these Resources is necessary to go up to the Enterprise plan, which allows you to enjoy other functionalities.
Apart from the display objects, all the other objects offer a development interface to create applications quickly and easily.
If we enter a Machine Learning type processing object, we can see the following image:
Each object allows you to add more files that will compile at the time of execution.
In the case of visualization objects, they are Kubes designed with the DataKubes visualization tool, as in our demo project:
When every project is created, the working repository is defined, the repository contains the data tables that will be used in the project objects, this repository is determined at the time of project creation in the DK-DW as shown in the following picture:
Once the project is created, the repository cannot be changed. We recommend cloning the project.
Once the project defines the repository, all the objects will point to it to obtain their data.
The instructions on how to consume the repository defined in the project that is part of the object are included in the programmer environment's development object. For example, in our python or PHP object, we can find a file called * welcome.py / welcome .php which can be seen in the following image:
The content of the welcome file allows knowing, depending on the selected platform, how to connect to the project repository.
Each Object in a project, depending on whether its type supports it, allows it to be executed individually. For example, in the case of Process Objects, we can see the following image:
The button *Run allows the execution flow of the object as such.
For serverless or daemon-type objects, they show additional options such as restart, turn on and turn off to control the services these objects provide. For example, see the following image:
In addition to the control options, the system allows showing the current status of the serverless container.
In the Data Workshop, we allow you to clone projects or objects within a project to speed up the reuse of code or objects, making it possible to simplify the work carried out and to be able to execute said work in another repository. When you clone an object, it will clone within the project to which the original object belongs. In cloning projects, it allows you to clone a project and define a different repository.
Clone a project to another repository
When cloning a project to another data repository, it is essential to recreate the tables used by the project; otherwise, the cloning to the target repository will not occur.
Once your project is ready, you can schedule its execution automatically by pressing Schedule Execution in the list of projects:
Once this opens, the Scheduling screen will allow adjusting the execution status, which can be:
You can also configure the months, days, and days of the week you want to run.
Projects Execution Hours.
At DataKubes, we have scheduled that all projects run at 2 AM on the days their schedule applies. Furthermore, it will run in the order in which it has been internally assigned.
Updated about 1 year ago