Amazon EMR
Amazon EMR is a managed big data platform for processing large datasets.
Big Data
Big Data is the handling and analysis of large and complex datasets beyond traditional processing capabilities.
Big Data is the handling and analysis of large and complex datasets beyond traditional processing capabilities. On this site, it matters because it transfers across technical, operational, and venture work instead of staying trapped in one narrow context.
Learn more: https://en.wikipedia.org/wiki/Big_data
Amazon EMR is a managed big data platform for processing large datasets.
Amazon Kinesis is a real-time data streaming platform for ingestion and processing.
Apache Druid is a real-time analytics database optimized for fast queries.
Apache Kafka is a distributed event streaming platform for real-time data pipelines.
Apache Spark is a distributed data processing engine for large-scale computation.
Designed a unified data model to integrate any data source into a big data geospatial analytics program, handling over 100TB of data in GCS and BigQuery with...
Projects building data pipelines, warehouses, lakes, and large-scale analytics infrastructure.
Solutions architect role redesigning a fundamentally flawed Pentaho ETL into a scalable AWS Redshift data warehouse for a hospitality leader. Identified root...
Using SafeGraph location observation data and Databricks/Spark, built a geospatial model to quantify human risk around utility infrastructure — helping...
Managed proprietary and sensitive big datasets in Google Cloud using GCS, BigQuery, and Composer (Apache Airflow). Built Confluence documentation from scratch...