Web Content Viewer
A data lake contains structured and unstructured data, giving agencies access to more information to build advanced insights.

The State's data analytics technology is a flexible, high-performance, state-secured platform.

The Ohio Data Analytics initiative supports business problem solving with an open architecture for innovation.

Secure Hybrid platform

The State provides access to cloud compute and storage for analytical projects as part of a hybrid cloud / on-premise strategy. The hybrid technology is available to state agencies, providing leading tools and the flexibility to run analytical workloads in the State's private data lake or in the cloud.

Bring your own tool (BYOT)

The Ohio Data Analytics platform allows for agencies to use their own tools for data visualization, SQL,  etc. to optimize their architecture and provide analytical processing at scale on massive datasets. Additionally, the Ohio Data Analytics team can help support application security within tools that can access the data lake platform.

Data lake

A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. The data lake provides a platform for real-time, self-service access and advanced analytics for data scientists,  line of business owners, and developers. When a business question arises, the data lake can be queried for relevant data, provided the user’s access has been administered.  With the power of analytics, agencies will be able to predict what will happen and understand how to intervene for better results  The concept of a data lake can be compared to a water body, a lake, where water flows in, filling up a reservoir and flows out. This reservoir of water is our data set, where you run analytics on the data. The outflow is the result of analysis.  


The data lake is divided into four zones (see illustration below):


More information about data lake zones:

Landing Zone

The landing zone is where raw data from any source is initially landed and data ingestion processes can occur (if necessary) prior to moving data to any agency zone. Some data ingestion processes that occur in the landing zone may include data encryption, file normalization, data conversions, and transformations. This zone is mainly used by platform administrators and data engineers creating data ingestion pipelines.

Agency Zone

The agency zones on the platform provide a centralized repository that stores the agency’s raw source data that can be accessed by analysts and developers within the agency to satisfy a variety of different analytics use cases. Agency zones are governed and managed by the agencies themselves. This is where end users of the platform would access data to build out downstream data analytics solutions.

Shared Data Zone

The open data zone provides access to public data sets on the platform. This zone can be accessed by all agencies to help enrich their data and improve data analytics solutions.

Refined Data sets Zone

The refined zones are used for creating refined data sets or plug & play data marts to satisfy reporting and analytics requirements. This can help provide more flexibility and agility when it comes to augmenting data warehouse reporting solutions. The refined zones can provide cleansed, integrated data sets that can be used to improve reporting and analytics for agencies.

Project Zone

The project zones are used by analysts and developers as a sandbox environment for research projects that integrate data from multiple data sets. These zones provide a secure area to scout, profile, explore and analyze data to solve complex business problems, prototype solutions, and satisfy any project requirements.

Metadata Management Zone

The metadata management zone is layered on top of the transient zone. It creates metadata and security for the data that has known use cases in the transient layer. This metadata can include data origination details and definitions, producing better understood and trusted data for downstream solutions. Additional security and encryption can also permit role-based access within the data lake.