FintechOS Cloud (for v22.x)
This document provides detailed information regarding different scenarios for FintechOS deployments in a FintechOS Cloud model.
Generic FintechOS Cloud Deployment
Based on the chosen or required model, there are two main deployment scenarios for production environments: single region deployment and dual region deployment.
Single Region Deployment (Specific for Standard Model)
An Azure PaaS model is used to deliver services, high availability, load balancing, and backup capabilities offered by the cloud provider using state-of-the-art technologies.
Dual Region Deployment (Default for Enterprise Model with the Disaster Recovery Add-on)
In the Enterprise model with the Disaster Recovery (DR) add-on, Azure paired regions are used to deliver high availability, best performance, and data resilience. Databases are continuously replicated between regions and, in case one region is affected by a disaster, the solution will be available in the secondary region.
Default Regions
FintechOS is running by default in the following regions:
| Zone | Primary Region | Secondary Region |
|---|---|---|
| EU | West Europe | North Europe |
| UK | UK South | UK West |
| US | East US | West US |
FintechOS can also deploy in other regions based on customer requirements.
Base Cloud Tiers
Standard and Enterprise models include one sandbox used by the development team to configure and customize the platform to suit customer requirements, one sandbox environment for test purposes before promoting into production, and a production environment. Compared to the Standard solution, the Enterprise model comes with a better performance for production purposes and an optional disaster recovery environment deployed in another region to assure continuity in case of a disaster.
All models come with infrastructure monitoring and security operations including active monitoring tasks, continuous improvement, component updates, security, and auditing for infrastructure resources. Platform upgrade tasks, application support, and application incident management are not included.
Based on customer requirements, additional environments can be added on request.
Add-Ons
In the EU, UK, and US regions, the following add-ons are available.
| Type | Add-ons |
|---|---|
|
Disaster Recovery |
DR Region |
| Additional Environments | Sandbox |
| Standard Production | |
| Enterprise Production | |
| Additional Resources | Storage
|
| Portal | |
| Optional Components | Anonymous Front-end |
| VPN* | |
| Dedicated API | |
| Additional Logging System | |
| Performance Upgrade | Depending on customer specific performance/load scenarios, FintechOS will analyze and provide an optimized solution to suit customer requirements. This analysis, apart from the compute performance requirements, will also include extra storage, logging, and traffic requirements (inter-region communication). |
*One VPN connection covers only one environment. If a customer has two sandbox environments and one production environment and VPN is required for each environment, then the VPN should be added to each environment.
Disaster Recovery
The Disaster Recovery add-on (DR Region) offers the following:
| Backup Services | Yes |
| Redundancy | Multi Region |
| Disaster Recovery Model | Manual Failover |
| RTO (Recovery Time Objective) | 2h |
| RPO (Recovery Point Objective) | 15 min |
| Failover Model | Manual Failover |
| SLA | 99.98% |
Sandbox
Sandbox environments are low performance environments used in non-production scenarios by a limited number of users.
FintechOS does not offer SLA for Sandbox environments due the fact that those environments are considered development and test environments used mostly for bug fixing, solution updates, and to test new scenarios.
From a functional perspective, all available features for Sandbox environments are equivalent to production environments. Security logging and proactive measures have a lower level compared to Production environments excluding SIEM capabilities.
Standard Production
Standard models represent production environments running in a single region. FintechOS deploys services required by customers in the region based on the regulatory compliance for that region. The standard model guarantees a minimum level of performance.
Production environments have all the monitoring measures in place, come with a predefined performance level according to the offer, and the latest versions of software, products, or updates are pushed live to the intended users. This is the environment where the end user can see, experience, and interact with the product. In a production environment, regardless of the performance required for running the load, a minimum size for infrastructure components is required to achieve the security and availability standard required.
The following performance indicators are guaranteed for standard environments:
| Performance Indicator | Value |
|---|---|
| Concurrent users | 15 |
| Response time | less than 5 seconds |
| Number of hits/sec | 20 |
| Disaster recovery | restore from backup |
Enterprise Production
Enterprise production environments are standard production environments with upgraded guaranteed performance indicators:
| Performance Indicator | Value |
|---|---|
| Concurrent users | 25 |
| Response time | less than 5 seconds |
| Number of hits/sec | 50 |
| Disaster recovery | two-region deployment |
Concurrent Users means the number of end-users who initiate a request to the same FintechOS Platform resource at the same time.
Example: One user initiates one or multiple requests when clicking the Submit button on a form in the FintechOS Portal. A user who is only logged into FintechOS Portal, but does not input data or only reads the displayed information is not considered a concurrent user. The number of concurrent users should be evaluated by the customer based on each use case.
Example: 100 users are connected to the platform requesting a loan and all of them are filling their data at the same time. If we consider it takes 2 minutes to add personal data and FintechOS requires 5 seconds to evaluate and approve the input data, the number of concurrent users is considered to be between 4 and 6.
The Response time is calculated for a business transaction initiated in browser.
Example: The response time for login includes the page load time for the login screen, credential input/validation (get token), landing page load time – all actions in under 5 seconds.
Storage
By default, all models include 3TB of storage which is enough for common scenarios. Some business scenarios require extended storage and FintechOS offers the possibility to extend the default model based on customer requirement/business analysis.
Database storage is managed separately, and the default database storage included is 250 GB which is typically enough for common business scenarios.
Portal
The Standard model comes with one Digital Experience Portal, while the Enterprise model includes two.
Anonymous Front-end
You may want to provide customers with unauthenticated access through anonymous front-ends to specific user journeys from a link on your website. FintechOS Studio makes this possible by exposing form driven flows and custom flows to unauthenticated users.
FintechOS can offer unauthenticated user journeys as an additional feature, in order to allow users to perform different journeys without requiring a personal account.
An unauthenticated portal is required when you want to expose a front-end which can be accessed without user credentials. Typical cases include onboarding portals or loan/insurance portals in which a user can create an account or buy a product.
An anonymous front-end environment with a secure architecture has been designed to allow exposing journeys to unauthenticated users (customers).
The reverse proxy ensures a single point of authentication for all HTTP requests, forwarding the requests to the FintechOS B2C App (the one that contains the form driven flow to be exposed). It also handles requests to the FintechOS back-office apps (FintechOS Studio and portals).
VPN
Site-to-site VPN is available in our FintechOS Cloud deployment models and it is required for private and secure communication between FintechOS environments and customer on-premise services. It is usually required by internal users to access the platform via private networks without Internet access or to secure access to on-premises APIs/services from the FintechOS environment.
Dedicated API
Additional APIs with specific functions and high performance can be added based on customer requirements.
This is recommended when you need a highly scalable model to offer specific functionalities at a large scale.
Additional Logging System
By default, all models include an embedded log storage and log analytics feature with 5 GB of daily ingested logs and a retention of 3 months by default, which is enough for most customers. Some business scenarios require extensive logging and FintechOS offers the possibility to extend the default model based on customer requirements/business analysis.
Performance Upgrade
If your FintechOS tier does not offer the desired performance, we can provide additional performance improvements based on specific business scenarios.
Components (Features) Provisioned
FintechOS Studio
FintechOS Studio is an IDE which gives both citizen developers and software engineers the tools to build, customize, and extend digital solutions on top of the FintechOS Platform.
Using FintechOS Studio, you can configure all the components that make up a comprehensive digital experience, to create personalized products, digital journeys, or back-office applications.
Digital Experience Portal
FintechOS provides various ways for you to streamline your business user's experience by customizing the Digital Experience Portals according to their needs including Digital Journeys, Visual Branding Support, Enriched Dashboards, and Native Analytics.
Job Server
This component is required to trigger asynchronous tasks based on the input provided by platform components. E.g.: generate a contract in the background and trigger external APIs, check for specific information by running queries on external services, run overnight jobs, etc.
Open API
Exposes platform APIs to external services using a standard language. It is used when you want to expose platform functionality to external services such as customer internal services.
Service Pipes
The services pipes add-on is required to integrate with external data sources and it brings additional features as parallel processing, fast integration, and standard configuration available for well-known services (e.g.: PayU).
Common Features Offered in Production Environments
*RPO is backed up for database, which contains critical data. Storage account files are stored on an Azure Storage account and have an RPO of 15 minutes as stated by Microsoft.
**not applicable to Sandboxes
Availability
Backup Services
FintechOS simplifies workload protection with built-in backup capabilities for customer data by default, with scalable backup solutions based on your storage needs. Backup policies are in place to protect customer data against loss and are continuously monitored through internal centralized management interface.
The default backup policies include:
| Backup Item | Backup Solution | Backup Policy |
|---|---|---|
| SQL Database | Azure SQL Database Backup | Incremental backups every 20 minutes Daily backup for 30 days Monthly Backups fo12 months |
| App Services | App Services | Weekly backups - 30 days (no data stored- only App version) |
| Storage Account | Versioning, Soft Delete, Geo-replication | 7 days retention period for soft delete, previous versions available Geo-replication to have a copy of your data in the paired Azure region. |
Redundancy
All environments offer a redundant model and, depending on the chosen FintechOS tier, where you can benefit from a redundancy model suitable for your workloads.
In Standard models, your data is kept in a single region. The platform keeps data files and databases in a locally redundant storage. This means your data is synchronously copied three times in a single physical location in the primary region.
Enterprise models offer durability of data even in case of a complete outage in the primary region by replicating your data to a secondary region. The data is replicated asynchronously to a single physical location in the secondary region. Within the secondary region, your data is copied synchronously three times in the same location.
Disaster Recovery Model
Disaster recovery is included as a service in Standard and Enterprise models.
The Standard disaster recovery model is based on backups, and it can restore the entire environment to a secondary region which is hundreds of miles away from the primary region. In case of disaster, all your services will be provisioned using predefined automated scripts in the secondary region and data will be restored based on the latest backup information available according to backup policy.
The DIsaster Recovery add-on can be added to the Enterprise edition. The Enterprise edition with the DR add-on keeps a secondary region up and running, with all services and your data available in case of disaster, based on asynchronously replicated storage and databases.
In case of disaster, there is always a risk of data loss because services are not available, and some data might be compromised without control based on our commited RPO values.
RTO (Recovery Time Objective)
This is a measure of how long it takes to do the failover in different scenarios. The failover time depends on several factors including database RTO, storage account RTO, and DNS switching time. FintechOS has dependencies on the underlying infrastructure, and RTO time is backed up by Microsoft services. Also, a fully functional environment might have dependencies on the customer integrated services which are not covered by RTOs provided by FintechOS.
When a Disaster is declared, it means that the services are not recoverable in the current region. In this case, the services will be recovered in the desired conditions under agreed RTOs.
Standard environment recovery model RTOs depend on the provisioning time for resources in the secondary region, DNS switching time, database RTO, and storage account data.
Enterprise models offer a better approach by having the secondary region services already available and data replicated asynchronously. In case of disaster, failover is manually triggered based on the Crisis Management Team decision. This approach has advantages considering that there is better control on the data, and we can decide the acceptable level of data loss. However, if a decision should be made, the recovery time is affected.
In Enterprise models, both regions are available and switching between environments in case of disaster is fully controlled by Microsoft based on the following:
- Traffic is routed to the secondary region based on predefined health probes.
- Failover automation is configured for database and all services and any outage in the primary region is immediately restored in the secondary region based on decision factors. Since data is asynchronously replicated in the secondary region, automatic failover without human decision is not advised since in some cases, the business loss of switching to a secondary region might be higher than fixing the primary region.
- Storage account failover is based on geo-replicated storage and, in case of disaster, failover is fully managed by Microsoft services and policies.
RPO (Recovery Point Objective)
RPO is a measurement of time from the failure, disaster, or comparable loss-causing event. RPOs measure back in time to when your data was preserved in a usable format, usually to the most recent backup or successful replication. Recovery processing preserves any data changes made before the disaster or failure.
In FintechOS, RPOs are determined by database and storage level data.
Standard tier RPOs are based on backup frequency for database and geo-replicated storage data. The Enterprise tier uses full replication mechanisms for data and storage, offering better RPO times.
Failover Model
For the Enterprise tier, a secondary region is already provisioned and is ready to accept workloads. In case of disaster, a failover and a switch to the secondary region can be performed. Failover in Enterprise tiers is performed on demand based on a business decision related to the impact level of the disaster in the primary region.
A forced failover is a consequence of a disaster and it might result in data loss considering the asynchronous replication mechanisms. Estimated data loss is based on the RPO detailed above.
Performance
Performance Enhancement
All FintechOS environments are based on Azure PaaS services that offer state-of-the-art technology to host your services while providing you the best cost/performance balance. Standard and Enterprise tiers provide enhanced performance using Premium level services to host FintechOS workloads.
Performance Model
The general purpose model is designed for applications with typical availability and common I/O latency requirements, and it has improvements such as auto-scale and premium tier resources for front-end workloads.
A business critical tier is the optimal choice for mission-critical workloads offering the highest performance tiers, lowest latency, auto-scale features, and additional high availability measures based on state-of-the-art underlying technology.
Adaptive Performance
With adaptive performance, you are never worried that the system will block during peak loads. The production environment is configured to increase performance based on predefined rules related to performance metrics such as CPU or memory usage. FintechOS front-end components are configured to automatically create additional nodes (scale out) when performance drops below predefined criteria.
Security
Encryption in Transit
Protects all your data if communications are intercepted while data moves between clients and FintechOS or between services. This protection is achieved by encrypting the data before transmission. All endpoints are securely exposed using HTTPS with TLS 1.2 or above.
Data Encryption at Rest
Encryption at rest provides data protection for stored data (at rest). Attacks against data at rest include attempts to obtain physical access to the hardware on which the data is stored, and then compromise the contained data. FintechOS and Microsoft Azure include tools to safeguard data according to security and compliance needs. Data is encrypted at rest without the risk or cost of a custom key management solution. All encryption keys are automatically managed by Microsoft. Data in Azure Storage is encrypted and decrypted transparently using 256-bit AES encryption, one of the strongest block ciphers available, and is FIPS 140-2 compliant. Azure Storage encryption is similar to BitLocker encryption on Windows. At database level, FintechOS uses Azure SQL Database which is encrypted by default using Transparent data encryption (TDE).
Vulnerability Assessment
All FintechOS production environments offer an integrated vulnerability assessment solution (powered by Qualys) which is one of the leading tools for real-time identification of vulnerabilities.
Supplementary, all the exposed services are following a strict process of periodically vulnerability scanning managed by FintechOS.
Access Restrictions Based on Custom Policies
ACLs are a collection of permit and deny conditions, that provide security by blocking unauthorized users and allowing authorized users to access FintechOS resources. ACLs are custom-configured based on a need-to-know basis on all FintechOS environments. Depending on project needs and customer requirements ACLs will block any unwarranted attempts to reach network resources.
Antimalware Prevention
Antimalware prevention is based on Microsoft Antimalware for Azure, a single-agent solution for applications and tenant environments, designed to run in the background without human intervention. Protection is deployed based on application workloads, with secure-by-default and advanced custom configuration, including antimalware monitoring.
Web Application Firewall
FintechOS environments are protected using a WAF which provides centralized protection of web applications from common exploits and vulnerabilities. Web applications are increasingly targeted by malicious attacks that exploit commonly known vulnerabilities and WAF can detect and prevent all requests based on a set of rules. The rules are configured and applied according to the application requirements and false positive rules are eliminated in order to offer the best customer experience while having higher protection against attacks.
Secret and Certificate Management
All FintechOS environments include a secret and certificate management solution. Storing and handling secrets, encryption keys, and certificates directly is risky, and every usage introduces the possibility of unintentional data exposure. FintechOS uses a Key Vault to provide a secure storage area for managing all app secrets so you can properly encrypt your data in transit or while it is stored.
Cloud Workload Protection Platform
Using AI and automation, we quickly identify threats, streamline threat investigation, and automate remediation. The Security Center's integrated cloud workload protection platform (CWPP), brings advanced, intelligent, protection of workloads. The Azure Security Benchmark provides a truly customized view of your compliance.
Secure API Exposure
We employ a dedicated API management tool to optimize API traffic flow and meet security and compliance requirements while having a unified management experience and full observability across all APIs.
Automatic Threat Prevention
Built-in threat protection functionality is provided through services such as Azure Monitor logs and Azure Security Center. This collection of security services and capabilities provides a simple and fast way to understand what is happening with your FintechOS deployments.
Incident Management
Central management and automation of incident handling using automation rules define and assign playbooks to incidents (not just to alerts). Automation rules also allow you to automate responses for multiple analytics rules at once, automatically tag, assign, or close incidents, and control the order of actions that are executed. Automation rules streamline automations used in Azure Sentinel to simplify complex workflows for your incident orchestration processes.
Active Traffic Monitoring & Always-On Detection
Services running on Azure are inherently protected by the default infrastructure-level DDoS protection, which provides defense against common network-layer attacks through always-on traffic monitoring and real-time mitigation.
SIEM with Built in AI
Azure Sentinel delivers intelligent security analytics and threat intelligence, providing a single solution for alert detection, threat visibility, proactive hunting, and threat response.
Proactive Threat Detection
Investigate threats with artificial intelligence, hunt for suspicious activities at scale, tapping into years of cyber security work at Microsoft and respond to incidents with built-in orchestration and automation of common tasks.
SLA
Understanding customer availability expectations is vital to reviewing overall operations for the application. For instance, if a customer strives to achieve an application SLA of 99.99%, the level of activity required by the application is going to be far greater than if an SLA of 99.9% is the goal.
An uptime of 99.99% translates to about five minutes of total downtime per month. Is it worth the extra complexity and cost to reach five nines? The answer depends on the business requirements/criticality.
Other considerations when defining an SLA:
- To achieve four nines (99.99%), you can't rely on manual intervention to recover from failures. The application must be self-diagnosing and self-healing.
- Beyond four nines, it is challenging to detect outages quickly enough to meet the SLA.
- Think about the time window that your SLA is measured against. The smaller the window, the tighter the tolerances. It doesn't make sense to define the SLA in terms of hourly or daily uptime.
- Get agreement from customers for the availability targets of each piece of application, and document it.
The Service Level Agreement describes FintechOS commitment for uptime and connectivity. If the SLA for the platform is 99.9%, you should expect the service to be available 99.9% of the time. Additional services might have different SLAs. As an example, if you add an OCR connector to the platform and you use it in a customer journey, the composite SLA should be calculated as a combination between platform SLA and OCR connector SLA.
The SLA also includes provisions for obtaining a service credit if the SLA is not met, along with specific definitions of availability for each service. That aspect of the SLA acts as an enforcement policy.
How We Calculate FintechOS Cloud SLA
The SLA is calculated as a composite taking into consideration all components of the platform based on the models below.
Single Region Deployment (Standard Model)
For a single region deployment model (Standard Model), the SLA is calculated as a composite SLA, taking into consideration all components of the platform running in the deployment region.
Single region system design is not prepared to recover from a failure of the entire region and therefore such events are considered Force Majeure.
For example, when we have a service component that writes to the database and the service has a 99.95% SLA while the database has a 99.99% SLA, the maximum downtime you would expect will be lower than individual components, because if either service fails, the whole application fails.
The probability of each service failing is independent, so the composite SLA for this setup is 99.95% × 99.99% = 99.94%. That's lower than the individual SLAs, which isn't surprising because an application that relies on multiple services has more potential failure points.
SLAs for Multiregion Deployments (Enterprise Model with DR add-on)
SLAs for multiregion deployments involve a high-availability technique to deploy the application in more than one region and use a traffic management tool to fail over if the application fails in one region.
Multiregion system design is not prepared to recover from a failure of paired regions or global failures and therefore such events are considered Force Majeure.
The composite SLA for a multiregion deployment is calculated as follows:
- N is the composite SLA for the application deployed in one region.
- R is the number of regions where the application is deployed.
The expected chance that the application fails in all regions at the same time is ((1 − N) ^ R). For example, if the single-region SLA is 99.95%, the combined SLA for the two regions = (1 − (1 − 0.9995) \^ 2) = 99.999975%.
Scheduled Maintenance
Scheduled maintenance takes place during our maintenance window. The customer will be notified in advance by an e-mail sent to the registered e-mail address. It is possible that during this maintenance period the service or services are temporarily, completely, or partially out of use and, therefore, not available to the customer.
A scheduled maintenance message will contain the following information:
- Timeframe in which scheduled maintenance will take place
- Expected duration of scheduled maintenance
- The services on which scheduled maintenance will be of influence
Scheduled maintenance is excluded from the availability calculations unless the period for the scheduled maintenance is exceeded and the service is therefore not available.