background Layer 1

Data Factory Implementation

Customer
JSC "Kazakhtelecom"
Project manager on the customer side
Marat Abdildabekov
CIO
IT Provider
Axellect
Year of project completion
2023
Project timeline
June, 2022 - November, 2023
Project scope
22300 man-hours
Goals
Before "Data Factory Implementation" project started, the following goals were set:
1) Setup a clear strategic roadmap of the Data Factory development on a 3-year horizon;
2) Provide Data Factory consumers with user-friendly instruments (data model, data marts, and data governance tools) to work with data effectively;
3) Establish standardized approaches and processes on the whole company level to process datasets;
4) Increase the conversion rate of analytical hypotheses into working solutions driving business value;
5) Decrease Time-2-Market for any data asset (data mart, BI dashboard, ML model, etc.) implementation;
6) Highlight to the company's Executives the importance of investments in data management and data governance.

Project Results
The main results of the project are:

Qualitative:
1) Single view on the Data Function development;
2) Clear operating model (roles and processes)
3) Approach to drive Data Function that can be used by the client independently in the future;

Quantitative:
1) Time-2-Market for any data asset (data mart, BI dashboard, ML model, etc.) implementation decreased by ~ 17%;
2) Сonversion rate of analytical hypotheses into working solutions increased x2,5.


The uniqueness of the project

The uniqueness of the project is represented by its clear orientation on the business impact. In this case, "Data Factory Implementation" provided d-people (engineers, analysts) and business users with the data ecosystem, which enforced all required instruments to process huge amounts of data effectively: precalculated data marts, data discovery tools (glossary and catalog), data quality engine and enablers to build and test advanced analytical models. Additionally, during the project implementation, data governance processes covering the full cycle of data asset development were established and incorporated into operational activities of data-related teams.
All of the above was measured into clear business metrics such as:
1) Time-2-Market for any data asset (data mart, BI dashboard, ML model, etc.) implementation decreased by ~ 17%;
2) Сonversion rate of analytical hypotheses into working solutions increased x2,5.

Used software
ETL instruments: Apache AirFlow, Luigi, RabbitMQ, Spark, DASK и RAY;
Streaming ETL: Informatica Change Data Capture (CDC) and Apache Kafka;
Storage system: Hadoop HDFS with Apache Impala, Hive Metastore, Hadoop Yarn, Apache Ranger;
MPP database: Greenplum;
Analytics engine: ClickHouse;
Data Governance tools: Informatica Axon Data Governance, Informatica EDC (Enterprise Data Catalog), Informatica Data Quality.

Difficulty of implementation
The main difficulties of the "Data Factory Implementation" project can be divided into technical and organizational prospects.
Regarding technical ones, the main challenge was to integrate such a fragmented and sophisticated IT landscape into Data Factory. In this way, there are more than 100 internal and external sources having their business logic and producing about 4 petabytes of data. Integration of such datasets suggested cross-functional engagement: both d-people (engineers, analysts) and business users made their contributions.
As for organizational prospects, it was quite time-consuming to transfer the company's employees from "the legacy" to the new ways of working with data: provide descriptions, quality rules, and lineage to their data assets. Another point was to describe what actually "data ownership" means, and what will be changed for the particular user. Such things demanded a lot of educational and change-management initiatives from project leaders.

Project Description
"Data Factory Implementation" project included several milestones:

1) Data Strategy Development
2) Data Strategy Implementation (DWH refactoring and data governance processes incorporation into day-to-day activities).

As for Data Strategy Development, the following parts took place:
1.1) As Is Assessment: in this case, JSC "Kazakhtelecom" was investigated through its data landscape, data governance operating model (people and processes) and analytical use cases;
1.2) To Be Model Creation: here, target data architecture (including functional components and particular vendors) was designed, as well as the operating model (target roles, responsibilities, and processes while working with data). Moreover, the corporate data model concept was created using an industrial model (TM FORUM) as a reference. Last, but the most important point is the roadmap for the 3-year horizon, which includes the list of initiatives and required budget for their implementation.

As for Data Strategy Implementation, it was divided into two parts:
2.1) Organizational aspect: it included several data governance training seminars, which explained particular changes in the day-to-day operations of business lines. Moreover, after training courses, the operating model was piloted on the Retail business line. Further, it was corrected regarding the lessons learned and scaled on the whole organizational scope.
2.2) Technical aspect: it was decided to set up a full track for the concrete data domain - in this case, managerial reporting was chosen. It began with the business requirements collection, further prototyping, and development data model and data mart layers (on the new functional components). After, Data Governance tools step forward: required business and technical metadata were established in the Data Catalog and Business Glossary with simultaneous lineage creation. Moreover, data quality rules were collected and onboarded to the Data Quality tool, to provide continuous monitoring.

Together with the client this approach was approved and used to establish iterative refactoring of the previous Data Warehouse Solution.



Project geography
The project scope includes all branches of JSC "Kazakhtelecom" such as Retail, Corporate and Technical business lines, Information Technology division, and Central division, maintaining the functioning of all the organization's processes. All in all, the following project metrics could be indicated:
1) More than 100 internal and external sources producing about 4 petabytes were integrated to Data Factory;
2) More than 8000 raw tables were converted to data models and data marts, having descriptions, quality rules, and assigned business owners, highlighted in data governance tools;
3) More than 10 data governance processes were formulated, formalized, and implemented into day-to-day operations;
4) More than 100 people (internal employees, integrators, and consultants) were taking part in project implementation;
5) A year and a half the full project taken, including strategy development and implementation.

The international IT professionals' community has announced the 'Top 100 IT Leaders' project. It is a global initiative that allows top IT managers to share their experience, expand their professional network and showcase the best digitalization practices of their companies. Here we will answer the basic questions about the project.

April was full of new initiatives from vendors as well as some fascinating news on the technology front. With Olympics in Paris approaching, more  news will be flowing in from the capital of France.

Given the current job market situation, one may find a need to maintain a strong LinkedIn profile. AI can streamline the process and make it easier to connect with the right people and opportunities.

The integration of Artificial Intelligence (AI) into business operations marks a transformative era, enhancing efficiency and innovation across industries. From revolutionizing HR with automated recruitment to aiding early disease detection in healthcare, AI's impact is profound. It enables predictive cybersecurity, personalized customer interactions, and accelerated software development in IT. 

Lots of news from technology vendors and modern cases on how to use data analytics for operations excellence – this is what March brought us this year.

We use cookies for analytical purposes and to deliver you the best experience with our website. Continuing to the site, you agree to the Cookie Policy.