background Layer 1

Data Factory Implementation

Customer
JSC "Kazakhtelecom"
Project manager on the customer side
Marat Abdildabekov
CIO
IT Provider
Axellect
Year of project completion
2023
Project timeline
June, 2022 - November, 2023
Project scope
22300 man-hours
Goals
Before "Data Factory Implementation" project started, the following goals were set:
1) Setup a clear strategic roadmap of the Data Factory development on a 3-year horizon;
2) Provide Data Factory consumers with user-friendly instruments (data model, data marts, and data governance tools) to work with data effectively;
3) Establish standardized approaches and processes on the whole company level to process datasets;
4) Increase the conversion rate of analytical hypotheses into working solutions driving business value;
5) Decrease Time-2-Market for any data asset (data mart, BI dashboard, ML model, etc.) implementation;
6) Highlight to the company's Executives the importance of investments in data management and data governance.

Project Results
The main results of the project are:

Qualitative:
1) Single view on the Data Function development;
2) Clear operating model (roles and processes)
3) Approach to drive Data Function that can be used by the client independently in the future;

Quantitative:
1) Time-2-Market for any data asset (data mart, BI dashboard, ML model, etc.) implementation decreased by ~ 17%;
2) Сonversion rate of analytical hypotheses into working solutions increased x2,5.


The uniqueness of the project

The uniqueness of the project is represented by its clear orientation on the business impact. In this case, "Data Factory Implementation" provided d-people (engineers, analysts) and business users with the data ecosystem, which enforced all required instruments to process huge amounts of data effectively: precalculated data marts, data discovery tools (glossary and catalog), data quality engine and enablers to build and test advanced analytical models. Additionally, during the project implementation, data governance processes covering the full cycle of data asset development were established and incorporated into operational activities of data-related teams.
All of the above was measured into clear business metrics such as:
1) Time-2-Market for any data asset (data mart, BI dashboard, ML model, etc.) implementation decreased by ~ 17%;
2) Сonversion rate of analytical hypotheses into working solutions increased x2,5.

Used software
ETL instruments: Apache AirFlow, Luigi, RabbitMQ, Spark, DASK и RAY;
Streaming ETL: Informatica Change Data Capture (CDC) and Apache Kafka;
Storage system: Hadoop HDFS with Apache Impala, Hive Metastore, Hadoop Yarn, Apache Ranger;
MPP database: Greenplum;
Analytics engine: ClickHouse;
Data Governance tools: Informatica Axon Data Governance, Informatica EDC (Enterprise Data Catalog), Informatica Data Quality.

Difficulty of implementation
The main difficulties of the "Data Factory Implementation" project can be divided into technical and organizational prospects.
Regarding technical ones, the main challenge was to integrate such a fragmented and sophisticated IT landscape into Data Factory. In this way, there are more than 100 internal and external sources having their business logic and producing about 4 petabytes of data. Integration of such datasets suggested cross-functional engagement: both d-people (engineers, analysts) and business users made their contributions.
As for organizational prospects, it was quite time-consuming to transfer the company's employees from "the legacy" to the new ways of working with data: provide descriptions, quality rules, and lineage to their data assets. Another point was to describe what actually "data ownership" means, and what will be changed for the particular user. Such things demanded a lot of educational and change-management initiatives from project leaders.

Project Description
"Data Factory Implementation" project included several milestones:

1) Data Strategy Development
2) Data Strategy Implementation (DWH refactoring and data governance processes incorporation into day-to-day activities).

As for Data Strategy Development, the following parts took place:
1.1) As Is Assessment: in this case, JSC "Kazakhtelecom" was investigated through its data landscape, data governance operating model (people and processes) and analytical use cases;
1.2) To Be Model Creation: here, target data architecture (including functional components and particular vendors) was designed, as well as the operating model (target roles, responsibilities, and processes while working with data). Moreover, the corporate data model concept was created using an industrial model (TM FORUM) as a reference. Last, but the most important point is the roadmap for the 3-year horizon, which includes the list of initiatives and required budget for their implementation.

As for Data Strategy Implementation, it was divided into two parts:
2.1) Organizational aspect: it included several data governance training seminars, which explained particular changes in the day-to-day operations of business lines. Moreover, after training courses, the operating model was piloted on the Retail business line. Further, it was corrected regarding the lessons learned and scaled on the whole organizational scope.
2.2) Technical aspect: it was decided to set up a full track for the concrete data domain - in this case, managerial reporting was chosen. It began with the business requirements collection, further prototyping, and development data model and data mart layers (on the new functional components). After, Data Governance tools step forward: required business and technical metadata were established in the Data Catalog and Business Glossary with simultaneous lineage creation. Moreover, data quality rules were collected and onboarded to the Data Quality tool, to provide continuous monitoring.

Together with the client this approach was approved and used to establish iterative refactoring of the previous Data Warehouse Solution.



Project geography
The project scope includes all branches of JSC "Kazakhtelecom" such as Retail, Corporate and Technical business lines, Information Technology division, and Central division, maintaining the functioning of all the organization's processes. All in all, the following project metrics could be indicated:
1) More than 100 internal and external sources producing about 4 petabytes were integrated to Data Factory;
2) More than 8000 raw tables were converted to data models and data marts, having descriptions, quality rules, and assigned business owners, highlighted in data governance tools;
3) More than 10 data governance processes were formulated, formalized, and implemented into day-to-day operations;
4) More than 100 people (internal employees, integrators, and consultants) were taking part in project implementation;
5) A year and a half the full project taken, including strategy development and implementation.

June 2024 was a very dynamic month for AI and data analytics market, marked by significant events, product launches, and industry insights.

On June 4, 2024, the professional IT community Global CIO held its annual meeting with IT leaders from Kazakhstan. IT executives from leading companies in Kazakhstan were invited to the online meeting.

The big news of the past month was the launch of GPT-4o. This new version of generative AI now takes any combination of text, audio, images and video as input, and generates any combination of text, audio and images.

Inventory management plays an important role in retail development as it affects customer satisfaction, competitiveness and overall business performance. Azamat Nirov, Inventory Management product director at Napoleon IT, talks about the main differences in online and offline retail, as well as the factors that determine the specifics of these approaches.

On June 6, 2024, the professional IT community Global CIO held an annual meeting with IT leaders from Uzbekistan. The event named “CIO of Uzbekistan as part of Global IT community” took place at the International Hotel in Tashkent. The meeting was aimed to exchange best practices, experiences and get together IT leaders from Central Asia.
We use cookies for analytical purposes and to deliver you the best experience with our website. Continuing to the site, you agree to the Cookie Policy.