{"id":4583,"date":"2021-08-04T08:31:44","date_gmt":"2021-08-04T08:31:44","guid":{"rendered":"https:\/\/northbaysolutions.com\/?p=4583"},"modified":"2025-03-20T12:44:42","modified_gmt":"2025-03-20T12:44:42","slug":"amazon-athena-beyond-the-basics-part-1","status":"publish","type":"post","link":"https:\/\/northbaysolutions.com\/blog\/amazon-athena-beyond-the-basics-part-1\/","title":{"rendered":"Amazon Athena: Beyond The Basics \u2013 Part 1"},"content":{"rendered":"<div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-padding-top:0px;--awb-padding-right:0px;--awb-padding-bottom:0px;--awb-padding-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:1310.4px;margin-left: calc(-4% \/ 2 );margin-right: calc(-4% \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:30px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-image-element \" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-1 hover-type-none\"><img loading=\"lazy\" decoding=\"async\" width=\"1396\" height=\"748\" title=\"amazon athena\" src=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/amazon-athena.png\" class=\"img-responsive wp-image-4711\" srcset=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/amazon-athena-200x107.png 200w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/amazon-athena-400x214.png 400w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/amazon-athena-600x321.png 600w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/amazon-athena-800x429.png 800w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/amazon-athena-1200x643.png 1200w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/amazon-athena.png 1396w\" sizes=\"auto, (max-width: 640px) 100vw, 1396px\" alt=\"\"><\/span><\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-1 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-1 fusion-text-no-margin text-style\" style=\"--awb-font-size:28px;--awb-text-color:#000000;--awb-margin-bottom:50px;\"><h3 class=\"vc_custom_heading b_heading first_heading\"><strong>Working with Twitter (complex JSON) data set<\/strong><\/h3>\n<\/div><div class=\"fusion-text fusion-text-2 fusion-text-no-margin case-study-fusion-text-fix\" style=\"--awb-margin-bottom:20px;\"><p>Amazon Athena is a serverless interactive query service that allows analytics using standard SQL for data residing in S3. Before Athena, to query data sets on S3, Hive\/Presto\/Hue or similar tools had to be installed on top EMR service or integrated with other third party partner products.<\/p>\n<p>Athena also supports JDBC connectivity so the managed service can be easily integrated with wide variety of SQL and Visualization tools.<\/p>\n<p>https:\/\/aws.amazon.com\/athena\/<\/p>\n<\/div><div class=\"fusion-text fusion-text-3 fusion-text-no-margin case-study-fusion-text-fix\" style=\"--awb-margin-top:20px;--awb-margin-bottom:0px;\"><p>Lot of customers are interested in exploring Amazon Athena for their use case and looking for ways to optimize for performance and costs. As an APN Partner NorthBay has been working with Athena in testing and exploring various customer use cases. This is a multi-part blog series to share our findings as well as provide the audience with a jumpstart on working with Amazon Athena..<\/p>\n<\/div><div class=\"fusion-text fusion-text-4 fusion-text-no-margin\" style=\"--awb-font-size:28px;--awb-text-color:#000000;--awb-margin-bottom:50px;\"><h3><strong>Twitter use case<\/strong><\/h3>\n<\/div><div class=\"fusion-text fusion-text-5 fusion-text-no-margin\" style=\"--awb-text-color:#000000;--awb-margin-bottom:30px;\"><p>Unstructured Data and semi-structured (typically JSON) is becoming typical for Big Data sets. We have chosen Twitter data as the data set to validate working on Athena with complex JSON\u2019s. The current blog post will share the details of querying Twitter data using Athena and executing complex queries based on the data set.<\/p>\n<p>The following is the architecture followed for the implementation:<\/p>\n<\/div><div class=\"fusion-image-element \" style=\"text-align:center;--awb-margin-bottom:30px;--awb-max-width:100%;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-2 hover-type-none\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"64\" title=\"twitter\" src=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/twitter-300x64.png\" class=\"img-responsive wp-image-4588\" srcset=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/twitter-200x43.png 200w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/twitter-400x86.png 400w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/twitter-600x129.png 600w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/twitter-800x172.png 800w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/twitter-1200x258.png 1200w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/twitter.png 1458w\" sizes=\"auto, (max-width: 640px) 100vw, 300px\" alt=\"\"><\/span><\/div><div class=\"fusion-text fusion-text-6 fusion-text-no-margin\" style=\"--awb-margin-bottom:50px;\"><ul>\n<li>Configure Twitter for API access<\/li>\n<li>Configure Kinesis Firehose to stream the output to S3<\/li>\n<li>Configure and run Tweepy to read Twitter feed and stream to Kinesis Firehose<\/li>\n<li>Define schema definition in Athena<\/li>\n<li>Query Twitter data from Athena Query Editor<\/li>\n<li>Query Twitter data using JDBC connection<\/li>\n<li>Query Twitter data from Quicksight<\/li>\n<\/ul>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-2 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-7\" style=\"--awb-font-size:20px;--awb-text-color:#000000;--awb-margin-top:20px;\"><h3><strong>Configure Twitter for API access<\/strong><\/h3>\n<\/div><div class=\"fusion-text fusion-text-8 fusion-text-no-margin\" style=\"--awb-margin-bottom:20px;\"><p>To create this platform, you will need an AWS account and a Twitter application. Sign in with your Twitter account and create a new application at https:\/\/apps.twitter.com\/. Make sure your application is set for \u2018read-only\u2019 access. Next, choose Create My Access Token at the bottom of the Keys and Access Tokens tab. By this point, you should have four Twitter application keys: consumer key (API key), consumer secret (API secret), access token, and access token secret. Take note of these keys.<\/p>\n<\/div><div class=\"fusion-text fusion-text-9 fusion-text-no-margin\" style=\"--awb-margin-bottom:50px;\"><p>https:\/\/aws.amazon.com\/blogs\/big-data\/building-a-near-real-time-discovery-platform-with-aws\/<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-3 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-10\" style=\"--awb-font-size:20px;\"><h3><strong>Configure Kinesis Firehose to stream the output to S3<\/strong><\/h3>\n<\/div><div class=\"fusion-text fusion-text-11\"><p>Create a Kinesis Firehose Delivery Stream as the destination for our data.<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-4 fusion_builder_column_1_3 1_3 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:33.333333333333%;--awb-margin-top-large:0px;--awb-spacing-right-large:5.76%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:5.76%;--awb-width-medium:33.333333333333%;--awb-order-medium:0;--awb-spacing-right-medium:5.76%;--awb-spacing-left-medium:5.76%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-image-element \" style=\"--awb-margin-bottom:30px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-3 hover-type-none\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"155\" title=\"Athena-Image-2\" src=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-2-300x155.png\" class=\"img-responsive wp-image-4590\" srcset=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-2-200x103.png 200w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-2.png 400w\" sizes=\"auto, (max-width: 640px) 100vw, 300px\" alt=\"\"><\/span><\/div><div class=\"fusion-image-element \" style=\"--awb-margin-top:20px;--awb-max-width:100%;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-4 hover-type-zoomin\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"80\" title=\"Athena-Image-3\" src=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-3-300x80.png\" class=\"img-responsive wp-image-4591\" srcset=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-3-200x54.png 200w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-3-400x107.png 400w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-3-600x161.png 600w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-3-800x214.png 800w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-3.png 911w\" sizes=\"auto, (max-width: 640px) 100vw, 400px\" alt=\"\"><\/span><\/div><div class=\"fusion-text fusion-text-12\" style=\"--awb-margin-top:20px;\"><p>Choose \u201cCreate bucket\u201d:<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-5 fusion_builder_column_2_3 2_3 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:66.666666666667%;--awb-margin-top-large:0px;--awb-spacing-right-large:2.88%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:0px;--awb-width-medium:66.666666666667%;--awb-order-medium:0;--awb-spacing-right-medium:2.88%;--awb-spacing-left-medium:0px;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-13\" style=\"--awb-margin-left:0px;\"><p>Step 1: Configure Destination: Choose \u201cAmazon S3\u201d as Destination and select the existing S3 bucket or create a new Bucket for Firehose to persist the data.<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-6 fusion_builder_column_1_3 1_3 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:33.333333333333%;--awb-margin-top-large:0px;--awb-spacing-right-large:5.76%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:5.76%;--awb-width-medium:33.333333333333%;--awb-order-medium:0;--awb-spacing-right-medium:5.76%;--awb-spacing-left-medium:5.76%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-image-element \" style=\"--awb-margin-bottom:20px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-5 hover-type-none\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"133\" title=\"Athena-Image-4\" src=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-4-300x133.png\" class=\"img-responsive wp-image-4606\" srcset=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-4-200x89.png 200w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-4-400x178.png 400w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-4-600x266.png 600w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-4-800x355.png 800w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-4.png 975w\" sizes=\"auto, (max-width: 640px) 100vw, 400px\" alt=\"\"><\/span><\/div><div class=\"fusion-image-element \" style=\"--awb-margin-top:30px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-6 hover-type-none\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"115\" title=\"Athena-Image-5\" src=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-5-300x115.png\" class=\"img-responsive wp-image-4607\" srcset=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-5-200x77.png 200w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-5-400x153.png 400w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-5-600x230.png 600w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-5-800x306.png 800w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-5.png 975w\" sizes=\"auto, (max-width: 640px) 100vw, 400px\" alt=\"\"><\/span><\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-7 fusion_builder_column_2_3 2_3 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:66.666666666667%;--awb-margin-top-large:0px;--awb-spacing-right-large:2.88%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:2.88%;--awb-width-medium:66.666666666667%;--awb-order-medium:0;--awb-spacing-right-medium:2.88%;--awb-spacing-left-medium:2.88%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-14 fusion-text-no-margin\" style=\"--awb-margin-top:20px;--awb-margin-bottom:20px;--awb-margin-left:0px;\"><p>Once the Bucket is created, add a prefix to the data. In this case, json\/ prefix is added so all json data goes to the same bucket\/prefix<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-8 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-15\" style=\"--awb-margin-top:20px;\"><p>Step2: Configuration: Kinesis Firehose allows for optimizations and configuration for Buffer sizes, interval, compression, encryption and security policies. These values can be chosen based on the streaming ingest frequency and optimal size of the output file in S3.<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-9 fusion_builder_column_1_3 1_3 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:33.333333333333%;--awb-margin-top-large:0px;--awb-spacing-right-large:5.76%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:5.76%;--awb-width-medium:33.333333333333%;--awb-order-medium:0;--awb-spacing-right-medium:5.76%;--awb-spacing-left-medium:5.76%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-image-element \" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-7 hover-type-none\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"199\" title=\"Athena-Image-6\" src=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-6-300x199.png\" class=\"img-responsive wp-image-4611\" srcset=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-6-200x133.png 200w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-6-400x266.png 400w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-6-600x399.png 600w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-6-800x532.png 800w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-6.png 975w\" sizes=\"auto, (max-width: 640px) 100vw, 400px\" alt=\"\"><\/span><\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-10 fusion_builder_column_1_3 1_3 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:33.333333333333%;--awb-margin-top-large:0px;--awb-spacing-right-large:5.76%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:5.76%;--awb-width-medium:33.333333333333%;--awb-order-medium:0;--awb-spacing-right-medium:5.76%;--awb-spacing-left-medium:5.76%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-image-element \" style=\"--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-8 hover-type-none\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"53\" title=\"Athena-Image-7\" src=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-7-300x53.png\" class=\"img-responsive wp-image-4612\" srcset=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-7-200x35.png 200w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-7-400x71.png 400w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-7-600x106.png 600w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-7-800x142.png 800w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-7.png 975w\" sizes=\"auto, (max-width: 640px) 100vw, 400px\" alt=\"\"><\/span><\/div><div class=\"fusion-text fusion-text-16\" style=\"--awb-margin-top:30px;\"><p>We just have to click \u201cAllow\u201d in the new window without changing anything.<\/p>\n<p>Step3 Review:<\/p>\n<p>Review the configuration and click \u201cCreate Delivery Stream\u201d<\/p>\n<\/div><div class=\"fusion-image-element \" style=\"--awb-margin-bottom:50px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-9 hover-type-none\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"185\" title=\"Athena-Image-8\" src=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-8-300x185.png\" class=\"img-responsive wp-image-4613\" srcset=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-8-200x123.png 200w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-8-400x246.png 400w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-8-600x369.png 600w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-8-800x492.png 800w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-8.png 975w\" sizes=\"auto, (max-width: 640px) 100vw, 400px\" alt=\"\"><\/span><\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-11 fusion_builder_column_1_3 1_3 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:33.333333333333%;--awb-margin-top-large:300px;--awb-spacing-right-large:5.76%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:5.76%;--awb-width-medium:33.333333333333%;--awb-order-medium:0;--awb-spacing-right-medium:5.76%;--awb-spacing-left-medium:5.76%;--awb-width-small:100%;--awb-order-small:0;--awb-margin-top-small:0px;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-17\"><p>In the Firehose Delivery Stream console, we can see our created Delivery Stream with status \u201cCREATING\u201d. Once the status changes to \u201cACTIVE\u201d we can start using the delivery stream.<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-12 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-18 fusion-text-no-margin\" style=\"--awb-font-size:20px;--awb-margin-bottom:30px;\"><h3><strong>Ingest Twitter feeds from the feeder system (Tweepy\/Python)<\/strong><\/h3>\n<\/div><div class=\"fusion-text fusion-text-19\"><p>http:\/\/docs.tweepy.org\/en\/v3.5.0\/index.html<\/p>\n<p>We need a stream producer\/feeder system to publish streaming data to Kinesis Firehose. Tweepy is an open-source python library that enables communication with Twitter. The following code can be run on an EC2 instance (with relevant IAM role to access Kinesis Firehose and Twitter API credentials from the earlier step in configuration file) to feed the stream that we created earlier.<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-13 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-border-color:#c9c9c9;--awb-border-top:2px;--awb-border-right:2px;--awb-border-bottom:2px;--awb-border-left:2px;--awb-border-style:solid;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-20 fusion-text-no-margin\" style=\"--awb-margin-top:20px;--awb-margin-right:20px;--awb-margin-bottom:30px;--awb-margin-left:20px;\"><p>import tweepy<br \/>\nfrom tweepy import Stream<br \/>\nfrom tweepy import OAuthHandler<br \/>\nfrom tweepy.streaming import StreamListener<br \/>\nimport time<br \/>\nimport argparse<br \/>\nimport string<br \/>\nimport config<br \/>\nimport json<br \/>\nimport boto3<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-14 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-21 fusion-text-no-margin blog-text-style\" style=\"--awb-margin-top:20px;--awb-margin-bottom:50px;\"><p>def get_parser():<br \/>\n\"\"\"Get parser for command line arguments.\"\"\"<br \/>\nparser = argparse.ArgumentParser(description=\"Twitter Downloader\")<br \/>\nparser.add_argument(\"-q\",<br \/>\n\"\u2013query\",<br \/>\ndest=\"query\",<br \/>\nhelp=\"Query\/Filter\",<br \/>\ndefault='-')<br \/>\nparser.add_argument(\"-d\",<br \/>\n\"\u2013data-dir\",<br \/>\ndest=\"data_dir\",<br \/>\nhelp=\"Output\/Data Directory\")<br \/>\nreturn parser<\/p>\n<p>class MyListener(StreamListener):<br \/>\n\"\"\"Custom StreamListener for streaming data.\"\"\"<\/p>\n<p>def __init__(self, data_dir, query):<br \/>\nquery_fname = format_filename(query)<br \/>\nself.outfile = \"%s\/stream_%s.json\" % (data_dir, query_fname)<\/p>\n<p>def on_data(self, data):<br \/>\ntry:<br \/>\nresult = send_record_to_firehose(data)<br \/>\nprint(result)<br \/>\nexcept BaseException as e:<br \/>\nprint(\"Error on_data: %s\" % str(e))<br \/>\ntime.sleep(5)<br \/>\nreturn True<\/p>\n<p>def on_error(self, status):<br \/>\nprint(\"Error with status:\" + str(status))<br \/>\nif status == 420:<br \/>\nprint(\"You are being rate limited!!!.\")<br \/>\nreturn True<\/p>\n<p>def send_record_to_firehose(data):<br \/>\n\"\"\" Sends Json response from tweeter to Kinesis Firehose Delivery Stream<br \/>\nArguments:<br \/>\ndata \u2014 json file from tweeter<br \/>\nReturn:<br \/>\nString \u2014 json response from Kinesis Firehos<br \/>\n\"\"\"<br \/>\nclient = boto3.client('firehose')<br \/>\nresponse = client.put_record(<br \/>\nDeliveryStreamName='twitter-to-s3\u2032,<br \/>\nRecord=<br \/>\n)<br \/>\nreturn response<\/p>\n<p>def format_filename(fname):<br \/>\n\"\"\"Convert file name into a safe string.<\/p>\n<p>Arguments:<br \/>\nfname \u2014 the file name to convert<br \/>\nReturn:<br \/>\nString \u2014 converted file name<br \/>\n\"\"\"<br \/>\nreturn \".join(convert_valid(one_char) for one_char in fname)<\/p>\n<p>def convert_valid(one_char):<br \/>\n\"\"\"Convert a character into '_' if invalid.<\/p>\n<p>Arguments:<br \/>\none_char \u2014 the char to convert<br \/>\nReturn:<br \/>\nCharacter \u2014 converted char<br \/>\n\"\"\"<br \/>\nvalid_chars = \"-_.%s%s\" % (string.ascii_letters, string.digits)<br \/>\nif one_char in valid_chars:<br \/>\nreturn one_char<br \/>\nelse:<br \/>\nreturn '_'<\/p>\n<p>@classmethod<br \/>\ndef parse(cls, api, raw):<br \/>\nstatus = cls.first_parse(api, raw)<br \/>\nsetattr(status, 'json', json.dumps(raw))<br \/>\nreturn status<\/p>\n<p>if __name__ == '__main__':<br \/>\nparser = get_parser()<br \/>\nargs = parser.parse_args()<br \/>\nauth = OAuthHandler(config.consumer_key, config.consumer_secret)<br \/>\nauth.set_access_token(config.access_token, config.access_secret)<br \/>\napi = tweepy.API(auth)<br \/>\n# Added this logic to reconnect if it fails<br \/>\nwhile True:<br \/>\ntry:<br \/>\ntwitter_stream = Stream(auth, MyListener(args.data_dir, args.query))<br \/>\ntwitter_stream.filter(track=[args.query])<br \/>\nexcept Exception:<br \/>\n# Sure??, let's reconnect and keep tracking<br \/>\ncontinue<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-15 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-22 fusion-text-no-margin\" style=\"--awb-font-size:20px;--awb-margin-top:20px;--awb-margin-bottom:30px;\"><h3><strong>Define schema definition in Athena<\/strong><\/h3>\n<\/div><div class=\"fusion-text fusion-text-23\" style=\"--awb-margin-top:20px;\"><p>Catalog Manager UI is provided to define new tables in Athena. However, with complex JSON it is easier to run the schema definition DDL in the query editor. The following DDL is generated\/built based on Twitter data stream:<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-16 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-border-color:#c4c4c4;--awb-border-top:2px;--awb-border-right:2px;--awb-border-bottom:2px;--awb-border-left:2px;--awb-border-style:solid;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:30px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-24 fusion-text-no-margin blog-text-style\" style=\"--awb-margin-top:20px;--awb-margin-right:20px;--awb-margin-bottom:50px;--awb-margin-left:20px;\"><p>CREATE EXTERNAL TABLE IF NOT EXISTS tweets (<br \/>\ncreated_at string ,<br \/>\nid string ,<br \/>\nid_str string ,<br \/>\ntext string ,<br \/>\ndisplay_text_range ARRAY,<br \/>\nsource string ,<br \/>\ntruncated string ,<br \/>\nuser struct&lt; id:string , id_str:string , name:string , screen_name:string , location:string , description:string , protected:string , verified:string , followers_count:string , friends_count:string , listed_count:string , favourites_count:string , statuses_count:string , created_at:string , utc_offset:string , time_zone:string , geo_enabled:string , lang:string&gt;,<br \/>\nis_quote_status string,<br \/>\nextended_tweet STRUCT&lt; full_text:string, display_text_range:ARRAY,<br \/>\nentities:STRUCT&lt; media:ARRAY&lt;STRUCT&lt; id:string, id_str:string, indices:ARRAY,<br \/>\nmedia_url:string,<br \/>\nmedia_url_https:string,<br \/>\nurl:string,<br \/>\ndisplay_url:string,<br \/>\nexpanded_url:string,<br \/>\ntype:string,<br \/>\nsizes:STRUCT&lt; small:STRUCT&lt;w:string, h:string, resize:string&gt;,<br \/>\nthumb:STRUCT&lt;w:string, h:string, resize:string&gt;&gt;&gt;&gt;&gt;&gt;,<br \/>\nretweet_count string,<br \/>\nfavorite_count string,<br \/>\nretweeted_status STRUCT&lt; retweet_count:string, text:string&gt;,<br \/>\nentities STRUCT&lt; urls:ARRAY&lt;STRUCT&lt;url:string, expanded_url:string, display_url:string, indices:ARRAY&gt;&gt;,<br \/>\nuser_mentions:ARRAY&lt;STRUCT&gt;,<br \/>\nhashtags:ARRAY&gt;,<br \/>\nfavorited string,<br \/>\nretweeted string,<br \/>\npossibly_sensitive string,<br \/>\nfilter_level string,<br \/>\nlang string)<br \/>\nROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'<br \/>\nLOCATION 's3:\/\/twitter-to-s3\/json\/';<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-17 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-25 fusion-text-no-margin\" style=\"--awb-font-size:20px;--awb-margin-top:20px;--awb-margin-bottom:30px;\"><h3><strong>Query Twitter data from Athena Query Editor<\/strong><\/h3>\n<\/div><div class=\"fusion-text fusion-text-26\" style=\"--awb-margin-top:20px;\"><p>Athena Query Editor provides a UI to submit Queries to Athena. The response also captures the run time and the data scanned which are incredibly useful for estimating costs and optimizing queries.<\/p>\n<\/div><div class=\"fusion-image-element \" style=\"text-align:left;--awb-margin-bottom:50px;--awb-max-width:500px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-10 hover-type-none\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"189\" title=\"Athena-Image-9\" src=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-9-300x189.png\" class=\"img-responsive wp-image-4623\" srcset=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-9-200x126.png 200w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-9-400x251.png 400w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-9-600x377.png 600w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-9-800x503.png 800w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-9.png 975w\" sizes=\"auto, (max-width: 640px) 100vw, 300px\" alt=\"\"><\/span><\/div><div class=\"fusion-text fusion-text-27 fusion-text-no-margin\" style=\"--awb-font-size:20px;--awb-margin-top:30px;--awb-margin-bottom:20px;\"><p><strong>Query 1: Total records in the table<\/strong><\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-18 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-border-color:#cecece;--awb-border-top:2px;--awb-border-right:2px;--awb-border-bottom:2px;--awb-border-left:2px;--awb-border-style:solid;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:50px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-28 fusion-text-no-margin\" style=\"--awb-margin-top:20px;--awb-margin-right:20px;--awb-margin-bottom:20px;--awb-margin-left:20px;\"><p>SELECT count(*) FROM tweets<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-19 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-29 fusion-text-no-margin\" style=\"--awb-font-size:20px;--awb-margin-top:20px;--awb-margin-bottom:20px;\"><p><strong>Query 2: Get sample of 10000 records<\/strong><\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-20 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-border-color:#d8d8d8;--awb-border-top:2px;--awb-border-right:2px;--awb-border-bottom:2px;--awb-border-left:2px;--awb-border-style:solid;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:50px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-30 fusion-text-no-margin\" style=\"--awb-margin-top:20px;--awb-margin-right:20px;--awb-margin-bottom:20px;--awb-margin-left:20px;\"><p>&gt;<br \/>\nSELECT * FROM tweets LIMIT 10000<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-21 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-31 fusion-text-no-margin\" style=\"--awb-font-size:20px;--awb-margin-top:20px;--awb-margin-bottom:30px;\"><p><strong>Query Twitter data from SQL client with JDBC connection<\/strong><\/p>\n<\/div><div class=\"fusion-text fusion-text-32 fusion-text-no-margin blog-text-style\" style=\"--awb-margin-top:20px;--awb-margin-bottom:50px;\"><p>Detailed documentation is available from the following link to establish a connection to Athena from a client tool such as SQL Workbench:<\/p>\n<p>http:\/\/docs.aws.amazon.com\/athena\/latest\/ug\/connect-with-jdbc.html<\/p>\n<\/div><div class=\"fusion-text fusion-text-33 fusion-text-no-margin\" style=\"--awb-font-size:20px;--awb-margin-top:20px;--awb-margin-bottom:20px;\"><p><strong>Query 3: Top hashtags with at least 100 occurrences<\/strong><\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-22 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-border-color:#d8d8d8;--awb-border-top:2px;--awb-border-right:2px;--awb-border-bottom:2px;--awb-border-left:2px;--awb-border-style:solid;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:50px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-34 fusion-text-no-margin blog-text-style\" style=\"--awb-margin-top:20px;--awb-margin-right:20px;--awb-margin-bottom:20px;--awb-margin-left:20px;\"><p>SELECT ht.text,count(*)<br \/>\nFROM tweets<br \/>\nCROSS JOIN UNNEST (entities.hashtags) AS t(ht)<br \/>\nGROUP BY ht.text<br \/>\nHAVING count(*)&gt;100<br \/>\nORDER by count(*) desc<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-23 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-35 fusion-text-no-margin\" style=\"--awb-font-size:20px;--awb-margin-top:20px;--awb-margin-bottom:20px;\"><p><strong>Query 4: Number of Tweets from verified accounts with the most followers<\/strong><\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-24 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-border-color:#cccccc;--awb-border-top:2px;--awb-border-right:2px;--awb-border-bottom:2px;--awb-border-left:2px;--awb-border-style:solid;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:50px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-36 fusion-text-no-margin blog-text-style\" style=\"--awb-margin-top:20px;--awb-margin-bottom:20px;--awb-margin-left:20px;\"><p>SELECT user.screen_name,user.name,max(user.followers_count),count(*)<br \/>\nFROM tweets<br \/>\nWHERE user.verified='true'<br \/>\nGROUP BY user.screen_name,user.name<br \/>\nORDER BY cast(max(user.followers_count) as integer) DESC<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-25 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-37 fusion-text-no-margin\" style=\"--awb-font-size:20px;--awb-margin-top:20px;--awb-margin-bottom:20px;\"><p><strong>Query 5: Top URL mentions in Tweets<\/strong><\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-26 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-border-color:#d3d3d3;--awb-border-top:2px;--awb-border-right:2px;--awb-border-bottom:2px;--awb-border-left:2px;--awb-border-style:solid;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:50px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-38 fusion-text-no-margin\" style=\"--awb-margin-top:20px;--awb-margin-right:20px;--awb-margin-bottom:20px;--awb-margin-left:20px;\"><p>SELECT url_extract_host(u.expanded_url),<br \/>\ncount(*)<br \/>\nFROM tweets<br \/>\nCROSS JOIN UNNEST (entities.urls) AS t(u)<br \/>\nGROUP BY url_extract_host(u.expanded_url)<br \/>\nHAVING count(*)&amp;gt;100<br \/>\nORDER by count(*) desc;<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-27 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-39 fusion-text-no-margin\" style=\"--awb-font-size:20px;--awb-margin-top:20px;--awb-margin-bottom:20px;\"><p><strong>Query 6: Hashtags tweeted along with \u201cAmazon\u201d<\/strong><\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-28 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-border-color:#d6d6d6;--awb-border-top:2px;--awb-border-right:2px;--awb-border-bottom:2px;--awb-border-left:2px;--awb-border-style:solid;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:50px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-40 fusion-text-no-margin\" style=\"--awb-margin-top:20px;--awb-margin-right:20px;--awb-margin-bottom:20px;--awb-margin-left:20px;\"><p>Query 6: Hashtags tweeted along with \u201cAmazon\u201dWITH ht_list AS<br \/>\n(SELECT entities.hashtags<br \/>\nFROM tweets<br \/>\nCROSS JOIN UNNEST (entities.hashtags) AS t(ht)<br \/>\nWHERE ht.text LIKE 'amazon')<br \/>\nSELECT t AS \"hashtag\",count(*) AS \"occurences\" FROM ht_list<br \/>\nCROSS JOIN UNNEST (hashtags) AS t(t)<br \/>\nGROUP BY t<br \/>\nORDER BY count(*) desc;<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-29 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-41 fusion-text-no-margin\" style=\"--awb-font-size:20px;--awb-margin-top:20px;--awb-margin-bottom:30px;\"><p><strong>Visualize Twitter data from Quicksight using Athena<\/strong><\/p>\n<\/div><div class=\"fusion-text fusion-text-42\" style=\"--awb-margin-top:20px;\"><p>The following blogpost provides information on querying from Athena using Quicksight<\/p>\n<p>https:\/\/aws.amazon.com\/blogs\/big-data\/derive-insights-from-iot-in-minutes-using-aws-iot-amazon-kinesis-firehose-amazon-athena-and-amazon-quicksight\/<\/p>\n<p>Quicksight currently does not support complex JSON\u2019s and expects the data types to be among the supported data types:<\/p>\n<p>http:\/\/docs.aws.amazon.com\/quicksight\/latest\/user\/data-source-limits.html<\/p>\n<p>The current dashboard displays sensitive Tweets by language from the data set:<\/p>\n<\/div><div class=\"fusion-image-element \" style=\"--awb-margin-bottom:20px;--awb-max-width:600px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);\"><span class=\" fusion-imageframe imageframe-none imageframe-11 hover-type-none\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"123\" title=\"Athena-Image-10\" src=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-10-300x123.png\" class=\"img-responsive wp-image-4630\" srcset=\"https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-10-200x82.png 200w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-10-400x164.png 400w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-10-600x246.png 600w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-10-800x328.png 800w, https:\/\/northbaysolutions.com\/wp-content\/uploads\/2021\/08\/Athena-Image-10.png 975w\" sizes=\"auto, (max-width: 640px) 100vw, 300px\" alt=\"\"><\/span><\/div><div class=\"fusion-text fusion-text-43 fusion-text-no-margin\" style=\"--awb-margin-top:20px;--awb-margin-bottom:50px;\"><p>The underlying query:<\/p>\n<\/div><div class=\"fusion-text fusion-text-44 fusion-text-no-margin\" style=\"--awb-font-size:20px;--awb-margin-top:20px;--awb-margin-bottom:20px;\"><p><strong>Query 7: Find the number of Tweets by language and sensitive media content<\/strong><\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-30 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-border-color:#e2e2e2;--awb-border-top:2px;--awb-border-right:2px;--awb-border-bottom:2px;--awb-border-left:2px;--awb-border-style:solid;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:50px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-45 fusion-text-no-margin\" style=\"--awb-margin-top:20px;--awb-margin-right:20px;--awb-margin-bottom:20px;--awb-margin-left:20px;\"><p>SELECT lang,possibly_senstive,count(*)<br \/>\nFROM tweets<br \/>\nGROUP BY lang, possibly_sensitive<\/p>\n<\/div><\/div><\/div><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-31 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-46 fusion-text-no-margin\" style=\"--awb-font-size:28px;--awb-margin-top:20px;--awb-margin-bottom:20px;\"><p><strong>Conclusion<\/strong><\/p>\n<\/div><div class=\"fusion-text fusion-text-47 blog-text-style\" style=\"--awb-margin-top:20px;\"><p>Twitter analysis using Athena proves that the product can be leveraged for use cases involving complex data formats (unstructured\/semi-structured), can be automated using JDBC connections, and reveals basic insights using Quicksight (lack of support for arrays currently hinders analytics capabilities).<\/p>\n<p>Athena can be excellent tool for \u201cS3 as a data lake\u201d use cases where the data is already staged in S3 and with serverless managed service, Athena takes precedence over previously used methods like Presto\/Hive\/Impala on EMR.<\/p>\n<\/div><div class=\"fusion-text fusion-text-48 fusion-text-no-margin\" style=\"--awb-text-color:#a3a3a3;--awb-margin-bottom:30px;\"><p>\u201cAthena: Beyond the Basics \u2013 Part 2\u201d<\/p>\n<\/div><div class=\"fusion-text fusion-text-49 fusion-text-no-margin\" style=\"--awb-font-size:20px;--awb-margin-top:20px;--awb-margin-bottom:20px;\"><h3><strong>Additional references:<\/strong><\/h3>\n<\/div><div class=\"fusion-text fusion-text-50 blog-text-style\" style=\"--awb-margin-top:20px;\"><p>List of SQL statements supported by Athena<\/p>\n<p>http:\/\/docs.aws.amazon.com\/athena\/latest\/ug\/language-reference.html<\/p>\n<\/div><\/div><\/div><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":3,"featured_media":4711,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55,38],"tags":[57,88],"class_list":["post-4583","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog","category-big-data-data-lake-analytics","tag-all-industries","tag-exclude"],"_links":{"self":[{"href":"https:\/\/northbaysolutions.com\/wp-json\/wp\/v2\/posts\/4583","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/northbaysolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/northbaysolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/northbaysolutions.com\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/northbaysolutions.com\/wp-json\/wp\/v2\/comments?post=4583"}],"version-history":[{"count":4,"href":"https:\/\/northbaysolutions.com\/wp-json\/wp\/v2\/posts\/4583\/revisions"}],"predecessor-version":[{"id":24356,"href":"https:\/\/northbaysolutions.com\/wp-json\/wp\/v2\/posts\/4583\/revisions\/24356"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/northbaysolutions.com\/wp-json\/wp\/v2\/media\/4711"}],"wp:attachment":[{"href":"https:\/\/northbaysolutions.com\/wp-json\/wp\/v2\/media?parent=4583"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/northbaysolutions.com\/wp-json\/wp\/v2\/categories?post=4583"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/northbaysolutions.com\/wp-json\/wp\/v2\/tags?post=4583"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}