Friday, April 27, 2018

Using Lambda and the new Firehose Console to transform data


Show you how you can create a delivery stream that will ingest sample data, transforms it and store both the source and the transformed data.


Amazon Kinesis Firehose is one of the easiest ways to prepare and load streaming data into the AWS ecosystem. Firehose was first released on October 2015 and it has evolved from just a simple solution to store your data without any modification to a delivery stream with transformation features. On July 2017 the delivery stream console was updated to offer you more options and so reduce the amount of work required to store and transform your data. This post offers you a guide to setup a proof of concept that will:

  1. Filter and transform sample data with n AWS Lambda function and store the results in S3.
  2. Keep the sample data to S3 for future analysis.
  3. Check the capabilities of the console, like encryption and compression.
  4. Take advantage of Firehose sample data producer (you won't need to create any script).


  • You will need an AWS account


Step 0: Access the Kinesis Firehose service

  1. Login into the AWS console.
  2. Search for the Kinesis service with "Find a service ..." text box or as an item of the "Analytics" list.
  3. Click on "Create delivery stream". In case you have not created a Kinesis stream before, you will need to press on "Get Started" first.

Step 1: Name and source

  1. Select a name for your delivery stream, for this demo I will use "deliveryStream2018".

  2. Choose a "Source", for this demo select "Direct PUT or other resources". Essentially you have two options here: Use a Kinesis Stream as the input for the delivery stream or you can send the records by other means:
  • PUT API: You will use this option if your custom application will feed the delivery stream directly with the AWS SDK.
  • Kinesis Agent: Use the agent to send information from logs produced by your applications, in other words, the agent will track the changes in your log files and send the information to the delivery stream.
  • AWS IoT: If you have an IoT ecosystem, you can use the rules to send messages to your Firehose stream. 
  • CloudWatch Logs: Sends any incoming log events that match a defined filter to your delivery stream.
  • CloudWatch Events: Deliver information of events when a CloudWatch rule is matched.
  1. Click "Next".

Step 2: Transform Records

  1. Once data is available in a delivery stream, we can invoke a Lambda function to transform it. To our relief, some ready-to-use blueprints are offered by AWS and you can adapt them according to your data format. In this tutorial, we will transform sample data offered by Firehose, so select "Enabled".

  2. Select "Create New".
  3. You will see a list of blueprints for you to use. We will process custom data so select the first one "General Firehose Processing". You will be taken to a new page, do not close the previous one, we will be back to it.

  4. The Lambda "Create function" page will open.
  5. Choose a "Name" for your function. 
  6. In the "Role" dropdown, select "Create new role from template(s)", this will create a new role to allow this Lambda function to logging to CloudWatch. Choose a "Name role", you may want to remember this one to delete it quickly when we are done with the tutorial.
  7. Leave the "Policy templates" field empty.
  8. Once you are ready select "Create function" and wait for the editor to appear.

Step 2.1: Into the Lambda realm

  1. Scroll down until you see the "Function code" section

  2. Change "Runtime" to "Node.js 8.10".
  3. The "index.js" file should be available to edit, if it is not, open the file with a double click in the file name on the left side. 
  4. Remove all the code and copy the next function and paste it into the editor:

'use strict';
console.log('Loading function');

/* Stock Ticker format parser */
const parser = /^\{\"ticker_symbol\"\:\"[A-Z]+\"\,\"SECTOR\"\:"[A-Z]+\"\,\"change\"\:[-.0-9]+\,\"price\"\:[-.0-9]+\}/i;

exports.handler = (event, context, callback) => {
    let success = 0; // Number of valid entries found
    let failure = 0; // Number of invalid entries found
    let dropped = 0; // Number of dropped entries 

    /* Process the list of records and transform them */
    const output = => {
        const entry = (new Buffer(, 'base64')).toString('utf8');
        console.log("Entry: ", entry);
        let match = parser.exec(entry);
        if (match) {
            let parsed_match = JSON.parse(match); 
            var milliseconds = new Date().getTime();
            /* Add timestamp and convert to CSV */
            const result = `${milliseconds},${parsed_match.ticker_symbol},${parsed_match.sector},${parsed_match.change},${parsed_match.price}`+"\n";
            const payload = (new Buffer(result, 'utf8')).toString('base64');
            if (parsed_match.sector !== 'RETAIL') {
                /* Dropped event, notify and leave the record intact */
                return {
                    recordId: record.recordId,
                    result: 'Dropped',
            else {
                /* Transformed event */
                return {
                    recordId: record.recordId,
                    result: 'Ok',
                    data: payload,
        else {
            /* Failed event, notify the error and leave the record intact */
            console.log("Failed event : "+;
            return {
                recordId: record.recordId,
                result: 'ProcessingFailed',
    console.log(`Processing completed.  Successful records ${output.length}.`);
    callback(null, { records: output });

  1. Go back to the function menu (the header), look for the dropdown where you can create a new test, it is right before the "Test" button, select "Configure Test Event" in the dropdown. A secondary window will appear.
  2. Select "Create new test event" to create a new test and "Kinesis Firehose" as "Event template".
  3. Select an "Event name".
  4. Copy and paste the next JSON object into the editor to use it as the input for your test.

  "records": [
      "recordId": "49583354031560888214100043296632351296610463251381092354000000",
      "approximateArrivalTimestamp": 1523204766865,
      "data": "eyJ0aWNrZXJfc3ltYm9sIjoiTkdDIiwic2VjdG9yIjoiSEVBTFRIQ0FSRSIsImNoYW5nZSI6LTAuMDgsInByaWNlIjo0LjczfQ=="
  "region": "us-east-1",
  "deliveryStreamArn": "arn:aws:kinesis:EXAMPLE",
  "invocationId": "invocationIdExample"

The data attribute is encoded in base64, this is the type of data received by Firehose. The value of this data after being parsed is:


  1. Select "Create", you will be taken back to the Function editor.
  2. Make sure to press "Save" to save your changes in the editor.
  3. Now run your test by selecting your test in the dropdown and press "Test".
  4. You should get quick green results, check the details of the execution to know more.
    1. If you expand the "Details" section you will be able to see the output.
    2. You may want to  look at the Base64 decoded object. 
    3. In this case, we are filtering and transforming the stocks where price is 5.0 or greater. The one that we are using for testing has a 4.73 as price, so this record ends as a "Dropped" record, indicating that is not going to be part of the transformation set, but it did not provoke an error. 
    4. A record that will be part of the transformation set will have a result attribute of "OK".
    5. You can remove the filter if you want to transform all your data. 
  5. Now you can go back to the Kinesis Firehose tab, you can return to to this tab later if you want to dig deeper.
  6. Back into the Firehose delivery stream wizard, close the "Choose Lambda blueprint" dialog.
  7. Select your newly created function in the "Lambda function" dropdown, refresh if necessary.
  8. Ignore the timeout warning, this lambda function does not require too much time to execute, so keep going and select "Next".

Step 3: Choose a destination

We have configured a serverless function to transform our records, but we have not selected where to store them, and neither if we want to keep the raw records. In this case, we will use both options.

  1. Select "Amazon S3" as destination for simplicity. This will be the service where we will store our transformed data.

  1. Select an existing bucket or create one.
  2. You may select a secondary prefix for your files, I will use "transformed" to distinguish it from the source files. Firehose will add a timestamp automatically in any case.
  3. In "S3 backup", select "Enable" the store the raw data too.
  4. Select the destination bucket or create one, you may select a prefix for this too. 
  5. Go ahead and press "Next".

Step 4: Configure settings

  1. Leave your S3 buffer conditions as they are. They indicate the maximum amount of time that must be passed or the maximum quantity of data that must be gathered before to execute your Lambda function. This is an OR condition, meaning when any of these rules are satisfied, the Lambda function will execute.

  1. If you want to save space and secure your data, you can select your desired compression and encryption options. I am using the defaults for this tutorial.

  1. Error logging is enabled by default, you can keep it like that in case you want to debug your code later.

  1. We need an IAM role to access the corresponding resources from Firehose, like S3. In the "IAM role" choose to "Create new, or Choose", a new tab will open.

  1. As we have selected to use S3 in the previous steps, the IAM policy that we need has already been prepared for us, reviewed if you are interested and press on "Allow". The role will be created and the tab will be closed.

  1. The new role will be listed in the "IAM role" dropdown, you can select more if needed.

  1. Select "Next" when ready.

Step 5: Review your configuration

  1. Take a moment to check the options that you have indicated, when ready select "Create delivery stream".

  1. You will be taken to the "Firehose delivery stream" page, you should see your new stream active after some seconds.

Step 6: Test your work

Firehose allows you to send demo data to your stream, let's try it out.

  1. Select your stream radio button to enable the "Test with demo data" button.

  1. Click the "Test with demo data" button. You will see the "Test with demo data" section.
  2. Select "Start sending demo data"

  1. Do not leave this page until you complete the next steps, but be sure to stop the demo to save money once you see the results in your S3 bucket(s), if you close the tab, the demo data should stop too.
  2. In this same page, go down and check the "Monitoring" tab. Wait two minutes and use the refresh button to see the changes in the metrics.

  1. Wait up to 5 minutes then check your bucket for results, they will be inside folders representing the date. Download the files produced and see the results. Your "source_recods" folder has the backup data.

  1. What if something goes wrong? Where are the logs? Well, you can take check your logs in Cloudwatch. In the "Monitoring" tab, you will see a link to CloudWatch console, once there, select "Logs" on the menu, then look for your Lambda or Firehose logs in the list.
  2. Go back to the Firehose tab and select "Stop sending demo data".


Once that you feel comfortable understanding the flow and the services used in this tutorial, it is a good idea to delete these resources.
If you are under the Free Tier, you will only incur in costs when your Firehose delivery stream is being fed, and if you are outside of the Lambda and S3 free tier limits, so as long as you are not producing and inserting data into the stream, you will not be charged. Still, it is a good idea to remove all when you are done.

Delete the delivery stream

  1. Go to the Firehose console page.
  2. Select your delivery stream.
  3. Press on the "Delete" or "Delete Delivery Stream" button depending on your location.

Delete S3 files and/or bucket

  1. Go to the S3 console page.
  2. Note: To select and item on S3, do not press on the link, select the row or checkbox.
  3. You may want to remove the files only, in that case, access the S3 console, then select the folders inside the bucket, select them and on the "More" menu, select "Delete".
  4. If you want to delete the bucket too, go back to the S3 console and select the destination bucket that you have used for this tutorial. Press on the "Delete Bucket" or "Delete Delivery Stream" button depending on your location.

Delete the Lambda function

  1. Access the Lambda console.
  2. On the left menu, select "Functions". Select your Lambda function and in the "Actions" menu, select "Delete".
  3. You can also delete the function directly into the Function editor using "Actions" and then "Delete function".

Delete the Roles

  1. Remember that you have created two roles during this tutorial, one for Lambda and one for Firehose.
  2. Access the IAM console.
  3. Select "Roles" on the left menu.
  4. The one for Lambda was chosen by you in a previous step, look for it and select it. If you are not sure about it, you can check the creation time of the roles using the gear on top of the list to show extra information, this and the firehose role should have been created during the same period of time. 
  5. The role created for Firehose should be named "firehose_delivery_role" unless you have chosen a different name.
  6. To delete them, select them using the checkboxes next to the item and then click on "Delete Role". You will be presented with information about the roles to confirm they are the ones that you want.


Firehose is fully managed service and it will automatically scale to match your throughput requirements without any ongoing administration, you can extend its capabilities with Lamda functions as we have demonstrated in this tutorial where we have ingested data from a system that produces sample stock records, then we have filtered and transformed it to a different format and we are also keeping copy of the raw data for future analysis in S3.


Thursday, March 15, 2018

AWS Certified Solution Architect Associate Tips

In this article, you will find relevant information about the topics and the sources that I have used to prepare and pass the AWS Certified Solution Architect (CSA) - Associate exam.

The reason to share

First of all, knowing about the experience of others is quite important to understand the current state of an exam where questions are frequently changing. Also, AWS is well known for improving services at a fast pace, therefore some questions in the exam are outdated to the current functionality. I do not want this to be a full guide because it will take me millions of words to do it, instead, I want to give you a guide that includes all the topics that I found in the exam and the level of knowledge that would be useful for you to pass it.


I presented and passed the CSA Associate exam on Friday, February 9. One day before, I got certified as an AWS Certified Developer (CD) - Associate. I will only cover the Architect case in this post.

My overall score: 91%.

The path that I have followed to get a score of 91%

I started to study intermittently from December 2017. During this time and until the first week of February 2018 I completed the A Cloud Guru Solution Architect course. Also, I took a deep look into FAQs and documentation for the topics that I found difficult. I will describe them later in this post.
I used the first two weeks of February to recap, this again included reading the documentation of topics related to EC2, EBS, Auto Scaling, ELB, and SQS, as well as some others.

During this time I found a free DB of AWS questions. I studied two sets, the version SAA v1 and started with v2, but the last one contains a lot of questions from the Professional level and decided to focus on the Associate one. By the way, even if the set that I studied included around 400 questions, only one of them appeared in the real exam. Does this mean that they are a waste of time? No, in my own opinion, it does help and it’s worth to take the time to answer these questions. You will know what topics you need to work on harder. The complexity of the questions is really similar to those that are in the exam. Also, you will find discussions in some of them, this is because not all the answers are correct, you can follow up the threads in order verify your answer.

I took the 3 exams that the A Cloud Guru Practice Exam Solution Architect offers at the beginning of February. These exams are different and help you validate knowledge in different areas each time. Here I got scores of:
  • First one: 80%
  • Second: 70%
  • Third: 75%

After that, I presented the Online AWS Test exams offered by Amazon and its proctor:
  • Architect Solution: 80%
  • Developer: 85%.

At IO Connect Services, as part of our career development, we formed a study group where we covered these topics: Cloud, Big Data, Systems Integration and Software Engineering to mention some of them. We organized a series of study groups with my teammates that are already certified, the one that helped me the most in this exam was the one related to VPCs because around 25% of the questions are related to this topic.

Deep dive into the exam

Single topic vs. Combination of topics

Some of the questions focus directly on the benefits of a specific service (e.g., S3) or to a specific feature (e.g., Cross Region Replication in S3). Be aware that these kind of questions are the minority.

Most of the questions combine two or more solutions, but topics like access, security, and costs are frequently used in a single question to test your knowledge (and it changes the final solution). But this makes sense, you are not studying to be an expert in the isolated pieces of the puzzle, you should know how they fit together in other to provide an end-to-end solution.

The topics that I’ve found in the exam

In the following section, I will cover the topics that I found in my exam with helpful comments and links to them so you know where to complement your study. My intention is not to give you the answers, but a sense of level in each topic I’ve identified.

CloudFormation vs. Elastic BeanStalk

  • The difference between these services should not be hard. While CloudFormation provides a common language to describe and provision all the infrastructure resources in your cloud environment, Elastic BeanStalk is a quick and easy-to-use service for deploying and scaling web applications and services developed in popular programming languages.


  • Know the limits and defaults of the SQS messages:
    • Message retention
    • Message Throughput
    • Message size
    • Message visibility timeout
    • You may check the full list in the SQS Developer Guide.
  • Understand how to convert a regular queue into a FIFO.

Identity Federation vs. IAM

  • You may need to answer questions related to the use cases for Federations, Cross-account, IAM: create users and their defaults.
  • You will find at least a question related to the steps to use SAML-based Federation.


  • You do not need to know DynamoDB in deep, but you do need to learn about the marketable features like high-scalability & flexibility. Take a look at the benefits here.



  • A Names, C Names, and Alias.
  • Check the main features of Route 53.


  • Know that Multi-AZ helps you to obtain high-availability in regards of database failover.
  • Remember that you can use read-replicas for some DBMS to increase performance.
  • Be aware that there are questions where there is a requirement to access the OS instance running the DB, as you will need to use EC2, this automatically eliminates all the RDS options. This is because with RDS you cannot have direct access to the OS.
  • You may find an ElasticCache question, remember that you have two options for it:
    • Redis: Helps to manage and analyze fast moving data with a versatile in-memory data store.
    • Memcached: Helps to build a scalable Caching Tier for data-intensive apps.

CloudWatch vs. CloudTrail


  • Do not forget that it is advised to stop an EC2 Instance to take an EBS snapshot in order to encrypt its content, then you can create encrypted volumes from that snapshot.
  • A question that will appear is related to spot instances and the cost, so remember that if Amazon stops a spot instance, you will not be charged for the cost of the current hour.


  • Know the difference between NAT Instances and NAT Gateways.
  • Review the Bastions topic, remember that it allows access to your private instances but you need to configure the security of both your private and public subnets.

VPC + EC2 + Security Group (SG) + Access Control List (ACL)

  • You need to fully understand the characteristics of SG and ACL and how they work, in few points you need to understand:
    • The default rules for the default SG and ACL.
    • The default rules for the custom SG and ACL that you create.
    • The meaning of stateful and stateless.
  • I recommend learning the topics described here

VPC + EC2 + IPs

  • Remember how to assign a public IP to an instance:
  • You cannot change a subnet CIDR.
  • The subnet CIDR block size can be from /16 to /28.
  • Do not forget that subnets are automatically connected to each other.

VPC + ELB + Auto Scaling


  • Know when to use a Classic, Network or an Application ELB.
    • An Application Load Balancer (ALB) can redirect to different ports too.
    • An ALB can redirect traffic according to the requests (so you can handle different microservices).
    • Do you know when to use a Network ELB (Layer 4) vs. an ALB (Layer 7)? Check this table.
    • Classic ELB was the first version and mostly used by old configurations, where no VPCs were set by AWS. It is not deprecated but it is not recommended.

API Gateway:

  • Remember that you need to enable CORS in order to make successful request between different services.
  • You can use CloudTrail with API Gateway to capture REST API calls in your AWS account and deliver the log files to an Amazon S3 bucket.


  • IAM is a pretty solid topic in the exams, study this topic here.
  • Remember that it is advised to assign roles to EC2 Instances instead of storing credentials on them.
  • I’ve got one question related to Cross-accounts where the Development team wanted to access the Production environment, check the Delegate Access Across AWS Accounts Using IAM Roles for more information about this scenario.


  • Know the Bucket URLs format
    • http://<bucket>.s3-<aws-region>
  • Multipart upload
    • You can upload files up to 5GB directly, from 5GB to 5TB you must use multipart upload to avoid the “EntityTooLarge” message.
  • Glacier & Infrequent Access (IA)
  • Cross-Region Replication

Shared Responsibility Model

  • Know your security responsibilities (most of the time related to access and security patches of your EC2s) vs. those from AWS.
  • When a storage device has reached the end of its useful life, AWS procedures include a decommissioning process, check the Storage Device Decommissioning topic in the AWS security whitepaper.



  • Remember that you if you need to use Chef you can use the AWS OpsWorks service.

Storage solutions: Connect your enterprise network to AWS

  • Direct Connect: It will increase the speed and security of your infrastructure, but it may take a while to be fully implemented.
  • Storage Gateway: You have Cache and Storage Cache Volumes Architectures.
  • AWS Import/Export: It is a service that accelerates data transfer into and out of AWS using physical storage appliances, bypassing the Internet.
    • You cannot export data from Glacier directly, you need to store it in S3 first.

Final advice

  • I ended the exam without being comfortable with 3 or 4 questions so I had to eliminate options instead of to be sure about the correct answer for those.
  • If you are like me, you will consume all time and try to review the questions that are difficult several times. A note in here, the questions and some answers are kind of long, you may not have enough time to go and check them all in a second round, use flags to mark those that you want to review.
  • Amazon exams do not have a fixed passing score, it varies depending on the scores obtained by other applicants. I knew a colleague that passed with 77% the day before I took the exam. In December, another colleague got a 67% and it did not pass. Aim to get at least a 70%, but you should be OK with a 75%.
  • Read the Amazon documentation for those topics that you do not understand. Even better, take a look at the re:Invent sessions in the AWS YouTube channel. People in Amazon repeat the popular ones every year while adding some of the new features. I watched ones related to VPCs and Auto Scaling a couple of times.
  • I read almost all the base documentation related to VPCs.

I hope this information helps you and others to prepare for the exam, as it has been one of the most difficult ones that I have taken. I invite you to check the topics discussed in this article to understand your weak points and reinforce them with the AWS FAQs, documentation and re:Invent videos on YouTube, I found these last 2 to be the most effective way to understand the difficult topics as the documentation and the experts are pretty good.


Where to study

Where to test your knowledge

Check other tips related to the AWS exams

Monday, February 19, 2018

MCD - API Design Associate tips

As MuleSoft partners, in IO Connect Services we care about constant education and certification for all its employees. In late January I presented and passed the exam for the MuleSoft Certified Developer - API Design Associate certification. MuleSoft recommends the Anypoint Platform: API Design course, which costs $1500 USD for 2 days. Here I’m sharing my findings that helped me to pass this exam.

Preparation guide

It all starts with what is covered in the exam. As you may know by now, MuleSoft publishes a preparation guide for all the certifications they have. For this particular topic, you can find the course guide here.

This guide will help you to know which are the topics you have to know in order to pass this exam. In summary, you will have to know the following:
  • RESTful basics.
  • HTTP details in order to implement such APIs in a RESTful approach.
  • SOAP basics.
  • API-led connectivity lifecycle.
  • RAML 1.0.
  • Design APIs.
  • Define APIs using RAML1.0.
  • Document APIs.
  • Secure APIs.
  • Test APIs.
  • Publish APIs on Anypoint Exchange.
  • Version best practices.

RESTful and SOAP basics.

RESTful and SOAP have been around for some time now. Nevertheless, it’s good to go back to basics from time to time. I’ve found this website that gives clear statements about basics and best practices when designing a RESTful application.

Make sure the effect of the HTTP specification on a RESTful endpoint. HTTP codes, headers, request, responses and more are covered in the exam. Make yourself comfortable with this topic as it’s very important for the design of an API.

Unlike RESTful, SOAP is an industry standard, one good reference is the W3C website: 

RAML 1.0

RESTful API Modeling Language, or RAML, is a standard to document APIs for RESTful applications. MuleSoft uses this standard in order to design and define APIs in Anypoint Platform and in the Mule runtime.

The first place you should look at is the specification itself.

But if you’re a tutorial person, you can use the RAML 1.0 tutorial.

Make sure you can write an API in RAML with nothing but a notepad. You will get some RAML snippets and will have to answer those questions based on them. This means you have to know whether the syntax is correct and what those snippets mean to the question. Also, lots of questions will come up, like syntax, design best practices, versioning best practices, and security. This is one of the most important topics in the exam as it’s the core of the API design in Mule.

One more thing, I’ve found a lot of people who think that RESTful is JSON. This is not true at all. While the usage of JSON in RESTful APIs is widely used, remember that it also supports XML and other payload formats via content-type header reference. This is particularly true for RAML as it can serialize objects based on the content type you specify in the document.

API-led connectivity and lifecycle

MuleSoft has a set of best practices for APIs. This is very well documented as API-led connectivity. You can get a quick view here.

Also, MuleSoft has very specific products and practices to manage the lifecycle of your APIs through the Anypoint Platform, such as API designer, API portal and Exchange. Make sure you know all these products inside out. As part of the lifecycle management, be sure you understand the role of each product in it. To start looking into these components, see this link:

One resource I knew recently is the API Notebook. A tool for writing API tutorials that you can share with your peers and that runs JavaScript snippets.

Be sure to know the API of this, you can find it here:


In my experience taking this exam, I noticed HTTP and RAML specifications are covered extensively. In the HTTP spec side, I got a bunch of questions about codes, requests formats, responses and headers in order to define an API properly.

I strongly advise you to get familiar with the API lifecycle management products in Anypoint platform. Moreover, do your own study projects on these products. Design an API from scratch using API Designer, publish it and make it discoverable within your organization. This will help you to understand MuleSoft’s practices and products while you study the specs as well. Will save you a little of time.

Let me know your experience about this exam. Write any comment and let’s help others looking for help on this topic.


IO Connect Services -

IO Connect Services - MuleSoft partnership -

API Design course overview -

API Design course guide -

REST API Tutorial website -

W3C SOAP tutorial -

RAML Specification -

RAML 1.0 tutorial -

API-led connectivity -

API Lifecycle management -

API Notebook -

API Notebook guide -