Implement Phenix Data Platform

Customer
Crisis Text Line
Project manager on the customer side
Alexey Artemov
Data platform Tech-lead
Year of project completion
2023
Project timeline
January, 2023 - September, 2023
Project scope
7200 man-hours
Goals

Implement a Data platform as a product, that fulfils the following criteria:

  • Establish a Central Data Hub: Develop a robust Data Platform serving as the singular, indisputable source of truth for all organisational data, ensuring accuracy and consistency across the board.
  • Empower Data Science and Research Teams: Provide state-of-the-art tools and resources to our Data Science and Research analyst teams, enabling them to delve deep into data, extract meaningful patterns, and drive innovative research initiatives.
  • Revolutionise Machine Learning Lifecycles: Provide tools and technics to Implement an end-to-end framework for Machine Learning models, encompassing exploration, training, evaluation, deployment, and seamless serving, with a focus on advanced LLM models, ensuring cutting-edge AI capabilities.
  • Optimised Workload Management: Engineer the platform to seamlessly handle both batch and real-time data processing with minimal latency, guaranteeing swift and efficient data processing regardless of the workload type, enhancing overall operational efficiency.
  • Global Deployment, Local Compliance: Design the platform to be deployable in diverse international settings while rigorously adhering to specific local regulations, such as GDPR in the European Union, ensuring compliance and data integrity on a global scale.

Project Results

Achievements:

  • Successfully onboarded Data Science & Research teams to the Platform
  • Completed development of the first prototypes for the risk assessment model & conversation simulator for our Volunteer Counselors
  • Deployed the Platform to 2 countries (comming more)

Technical accomplishments:

  • Installed all Data platform components using Infrastructure as Code (IaC), enabling us to rapidly deploy the platform to new countries within few days
  • Developed robust ingest & transformation data pipelines, including data scrubbing/anonymization to ensure the removal of Personally Identifiable Information (PII), with built-in monitoring & data quality features
  • Automated Continuous Integration/Continuous Deployment (CI/CD) pipelines
  • Data processing adhered to local regulations

The uniqueness of the project

We are pioneers in collaborating with individuals worldwide, spanning diverse languages and cultures, to provide unparalleled mental health support. Our groundbreaker approach includes robust multi-language support, ensuring accessibility for all. To elevate our services, we harness cutting-edge technologies, particularly leveraging the highly effective LLM models on our state-of-the-art Data Platform.

However, the complexities of our data necessitate an internal approach. We are dedicated to developing and implementing our Data Platform using exclusive, in-house resources. This pioneering endeavour ensures the secure management of our valuable data, safeguarding it from external access.

Our Data Platform stands as a testament to innovation, effortlessly accommodating various workloads. It boasts the capability to swiftly allocate essential resources to our esteemed Data Science and Research teams, empowering them to transform groundbreaking ideas into tangible solutions. These solutions, honed to perfection, are seamlessly transitioned into production. Equipped with indispensable features like real-time monitoring, scalability, and detailed logging, our solutions exemplify excellence at every stage of deployment.

Moreover, our commitment to compliance knows no bounds. Adhering to stringent local regulations is not just a requirement; it's our commitment. Any necessary changes are promptly integrated and deployed to the respective countries, ensuring that our pioneering mental health support services continue to set new standards worldwide.

Used software

The following technologies have been used:

  • AWS (infrastructure, AWS DMS, S3, etc)
  • Databricks
  • Github
  • Terraform & Terragrunt for IaC

Difficulty of implementation
  • Initially, we decided to use Change Data Capture (CDC) to gather data from source systems. However, we encountered troubleshooting issues with AWS DMS and realised the effort required for implementing incremental ingestion and transformations. As a result, we opted for batch ingestion initially, pulling data multiple times per day. This approach was considered sufficient from a business perspective.
  • The initial infrastructure was deployed without Infrastructure as Code (IaC), which caused issues when adjustments were needed and when multi-country support was expected. Consequently, we implemented IaC to address these concerns.
  • Our CI/CD process is implemented using various technologies. Some parts are automated using GitHub Actions, while others rely on executing Terraform/Terragrunt scripts. In the future, we aim to unify and simplify our process.

Project Description

At the core of our company's mission lie empathy and innovation. Our primary objective is to enhance mental well-being for individuals worldwide, transcending geographical barriers.

To realize this mission, we recognize the need to harness modern technologies, including LLM models. The central focus of our Data Platform is transitioning from our existing legacy system to a more robust, dynamic, and industry-grade infrastructure. This transition will empower us with cutting-edge capabilities, enabling us to deliver unparalleled mental health support. On top of the Data Platform, we will implement the following key enhancements:

  • Developing realistic conversation simulations for volunteers using LLM models, built based on our historical data.
  • Improving message classification (risk level) to ensure the prompt assignment of Volunteer Crisis Сounselors to critical cases.
  • Real-time monitoring of volunteer demand spikes, facilitating rapid outreach to Volunteer Crisis Counselors to address increased needs.

With these initiatives, we aim to make a significant positive impact on mental well-being, regardless of individuals' locations.

Project geography
US, EU

On November 30, the professional IT community GlobalCIO hosted a large-scaled international conference "Global CIO Insights: Digital Transformation with AI". During the event, leading experts shared their practical experience in launching projects utilizing artificial intelligence (AI) and highlighted approaches that helped elevate their companies to new heights.

Voting for projects participating in the "Project of the Year" contest is open. The voting began on December 1st and will continue until January 15th inclusive. The winners will be announced on February 7th, 2024.

According to the statistics agency, the number of telecommunication and IT companies has increased 1.8 times in the last five years. At the beginning of 2023, there are more than 12 thousand ICT companies operating in the country. More than 100 thousand people are employed in the industry. At the end of 2022, the volume of ICT services grew by 125.5% and amounted to 22.9 trillion soums, of which 4.2 trillion soums were programming services provided by Uzbek companies and specialists. The total revenue of Uzbekistan's IT sector for the first quarter of 2023 reached 2.38 trillion soums - almost four times more than in the same period of 2022. Exports of digital services increased to $57.2 million. According to IT Park, net income amounted to more than 90% of revenue or 2.158 trillion soums.

According to the statistics agency, the number of telecommunication and IT companies has increased 1.8 times in the last five years. At the beginning of 2023, there are more than 12 thousand ICT companies operating in the country. More than 100 thousand people are employed in the industry. At the end of 2022, the volume of ICT services grew by 125.5% and amounted to 22.9 trillion soums, of which 4.2 trillion soums were programming services provided by Uzbek companies and specialists. The total revenue of Uzbekistan's IT sector for the first quarter of 2023 reached 2.38 trillion soums - almost four times more than in the same period of 2022. Exports of digital services increased to $57.2 million. According to IT Park, net income amounted to more than 90% of revenue or 2.158 trillion soums.

Read more

The IT strategy of a large number of companies includes the automation (digitization) of not only accounting processes but also production and logistics processes.

We use cookies for analytical purposes and to deliver you the best experience with our website. Continuing to the site, you agree to the Cookie Policy.