Data Factory Implementation

Customer
JSC "Kazakhtelecom"
Project manager on the customer side
Marat Abdildabekov
CIO
IT Provider
Axellect
Year of project completion
2023
Project timeline
June, 2022 - November, 2023
Project scope
22300 man-hours
Goals
Before "Data Factory Implementation" project started, the following goals were set:
1) Setup a clear strategic roadmap of the Data Factory development on a 3-year horizon;
2) Provide Data Factory consumers with user-friendly instruments (data model, data marts, and data governance tools) to work with data effectively;
3) Establish standardized approaches and processes on the whole company level to process datasets;
4) Increase the conversion rate of analytical hypotheses into working solutions driving business value;
5) Decrease Time-2-Market for any data asset (data mart, BI dashboard, ML model, etc.) implementation;
6) Highlight to the company's Executives the importance of investments in data management and data governance.

Project Results
The main results of the project are:

Qualitative:
1) Single view on the Data Function development;
2) Clear operating model (roles and processes)
3) Approach to drive Data Function that can be used by the client independently in the future;

Quantitative:
1) Time-2-Market for any data asset (data mart, BI dashboard, ML model, etc.) implementation decreased by ~ 17%;
2) Сonversion rate of analytical hypotheses into working solutions increased x2,5.


The uniqueness of the project

The uniqueness of the project is represented by its clear orientation on the business impact. In this case, "Data Factory Implementation" provided d-people (engineers, analysts) and business users with the data ecosystem, which enforced all required instruments to process huge amounts of data effectively: precalculated data marts, data discovery tools (glossary and catalog), data quality engine and enablers to build and test advanced analytical models. Additionally, during the project implementation, data governance processes covering the full cycle of data asset development were established and incorporated into operational activities of data-related teams.
All of the above was measured into clear business metrics such as:
1) Time-2-Market for any data asset (data mart, BI dashboard, ML model, etc.) implementation decreased by ~ 17%;
2) Сonversion rate of analytical hypotheses into working solutions increased x2,5.

Used software
ETL instruments: Apache AirFlow, Luigi, RabbitMQ, Spark, DASK и RAY;
Streaming ETL: Informatica Change Data Capture (CDC) and Apache Kafka;
Storage system: Hadoop HDFS with Apache Impala, Hive Metastore, Hadoop Yarn, Apache Ranger;
MPP database: Greenplum;
Analytics engine: ClickHouse;
Data Governance tools: Informatica Axon Data Governance, Informatica EDC (Enterprise Data Catalog), Informatica Data Quality.

Difficulty of implementation
The main difficulties of the "Data Factory Implementation" project can be divided into technical and organizational prospects.
Regarding technical ones, the main challenge was to integrate such a fragmented and sophisticated IT landscape into Data Factory. In this way, there are more than 100 internal and external sources having their business logic and producing about 4 petabytes of data. Integration of such datasets suggested cross-functional engagement: both d-people (engineers, analysts) and business users made their contributions.
As for organizational prospects, it was quite time-consuming to transfer the company's employees from "the legacy" to the new ways of working with data: provide descriptions, quality rules, and lineage to their data assets. Another point was to describe what actually "data ownership" means, and what will be changed for the particular user. Such things demanded a lot of educational and change-management initiatives from project leaders.

Project Description
"Data Factory Implementation" project included several milestones:

1) Data Strategy Development
2) Data Strategy Implementation (DWH refactoring and data governance processes incorporation into day-to-day activities).

As for Data Strategy Development, the following parts took place:
1.1) As Is Assessment: in this case, JSC "Kazakhtelecom" was investigated through its data landscape, data governance operating model (people and processes) and analytical use cases;
1.2) To Be Model Creation: here, target data architecture (including functional components and particular vendors) was designed, as well as the operating model (target roles, responsibilities, and processes while working with data). Moreover, the corporate data model concept was created using an industrial model (TM FORUM) as a reference. Last, but the most important point is the roadmap for the 3-year horizon, which includes the list of initiatives and required budget for their implementation.

As for Data Strategy Implementation, it was divided into two parts:
2.1) Organizational aspect: it included several data governance training seminars, which explained particular changes in the day-to-day operations of business lines. Moreover, after training courses, the operating model was piloted on the Retail business line. Further, it was corrected regarding the lessons learned and scaled on the whole organizational scope.
2.2) Technical aspect: it was decided to set up a full track for the concrete data domain - in this case, managerial reporting was chosen. It began with the business requirements collection, further prototyping, and development data model and data mart layers (on the new functional components). After, Data Governance tools step forward: required business and technical metadata were established in the Data Catalog and Business Glossary with simultaneous lineage creation. Moreover, data quality rules were collected and onboarded to the Data Quality tool, to provide continuous monitoring.

Together with the client this approach was approved and used to establish iterative refactoring of the previous Data Warehouse Solution.



Project geography
The project scope includes all branches of JSC "Kazakhtelecom" such as Retail, Corporate and Technical business lines, Information Technology division, and Central division, maintaining the functioning of all the organization's processes. All in all, the following project metrics could be indicated:
1) More than 100 internal and external sources producing about 4 petabytes were integrated to Data Factory;
2) More than 8000 raw tables were converted to data models and data marts, having descriptions, quality rules, and assigned business owners, highlighted in data governance tools;
3) More than 10 data governance processes were formulated, formalized, and implemented into day-to-day operations;
4) More than 100 people (internal employees, integrators, and consultants) were taking part in project implementation;
5) A year and a half the full project taken, including strategy development and implementation.

The winners of the Project of the Year 2023 competition are IT leaders from India, Malaysia, Turkey, Kazakhstan, Uzbekistan, Kyrgyzstan, Armenia, Iraq and Brazil. 

The beginning of the year was marked by a flurry of activity by various legislators to deal with artificial intelligence. The New York Times filed a lawsuit against OpenAI and Microsoft, and the UAE created an AI council to accelerate the adoption of AI systems in critical sectors of the economy.

What should a business solution be, so that it is chosen by young people, but at the same time not rejected by older professionals? Maybe beauty will save the peace in the team, because everyone wants to work in an application that looks beautiful? It is true, but User Interface (UI) is good when it is based on User Experience (UX). If harmony between UI and UX is achieved, the product will be both beautiful and user-friendly.

Choosing between cross-platform development and native development for iOS and Android is not just a technical decision, it is a strategy that can determine the future of your mobile product.

Andrey Lukin, senior PHP-developer at an international dating startup, tells us how to create and implement an internal framework into the team's workflow so that the results live up to expectations.

We use cookies for analytical purposes and to deliver you the best experience with our website. Continuing to the site, you agree to the Cookie Policy.