The State's data analytics technology is a flexible, high-performance, state-secured platform.
|The Ohio Data Analytics initiative supports business problem solving with an open architecture for innovation.|
Secure Hybrid platform
The State provides access to cloud compute and storage for analytical projects as part of a hybrid cloud / on-premise strategy. The hybrid technology is available to state agencies, providing leading tools and the flexibility to run analytical workloads in the State's private data lake or in the cloud.
Bring your own tool (BYOT)
The Ohio Data Analytics platform allows for agencies to use their own tools for data visualization, SQL, etc. to optimize their architecture and provide analytical processing at scale on massive datasets. Additionally, the Ohio Data Analytics team can help support application security within tools that can access the data lake platform.
A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. The data lake provides a platform for real-time, self-service access and advanced analytics for data scientists,
The data lake is divided into four zones (see illustration below):
More information about data lake zones:
The landing zone is where raw data from any source is initially landed and data ingestion processes can occur (if necessary) prior to moving data to any agency zone. Some data ingestion processes that occur in the landing zone may include data encryption, file normalization, data conversions, and transformations. This zone is mainly used by platform administrators and data engineers creating data ingestion pipelines.
The agency zones on the platform provide a centralized repository that stores the agency’s raw source data that can be accessed by analysts and developers within the agency to satisfy a variety of different analytics use cases. Agency zones are governed and managed by the agencies themselves. This is where end users of the platform would access data to build out downstream data analytics solutions.
Shared Data Zone
The open data zone provides access to public data sets on the platform. This zone can be accessed by all agencies to help enrich their data and improve data analytics solutions.
Refined Data sets Zone
The refined zones are used for creating refined data sets or plug & play data marts to satisfy reporting and analytics requirements. This can help provide more flexibility and agility when it comes to augmenting data warehouse reporting solutions. The refined zones can provide cleansed, integrated data sets that can be used to improve reporting and analytics for agencies.
The project zones are used by analysts and developers as a sandbox environment for research projects that integrate data from multiple data sets. These zones provide a secure area to scout, profile, explore and analyze data to solve complex business problems, prototype solutions, and satisfy any project requirements.
Metadata Management Zone
The metadata management zone is layered on top of the transient zone. It creates metadata and security for the data that has known use cases in the transient layer. This metadata can include data origination details and definitions, producing better understood and trusted data for downstream solutions. Additional security and encryption can also permit role-based access within the data lake.