Engineering
How to create your own Puppeteer-as-a-service using NodeJS and Puppeteer?
Nearly a month ago we solved a very hard problem for our product, which involved automating auth token generation for a given website involving multiple login steps.

Ayush Agarwal
Apr 21, 2023
6 mins read
Introduction
Nearly a month ago we solved a very hard problem for our product, which involved automating auth token generation for a given website involving multiple login steps. We did solve this using multiple ways, though tackling this via using Puppeteer was really fun and challenging. In this blog, I will be sharing how to setup a new service using NodeJs for Puppeteer and my learnings doing it.
What exactly is Puppeteer?
Puppeteer is a node library that can come in handy when you need to control a browser instance. It's useful for web scraping, automating workflows, and automating tests. Whenever you trigger the script run, it initialises a new browser instance (which comes with chromium by default), and it can replicate exact user workflows. This means it can do something as simple as clicking a button, or it can replicate complex flows, such as logging into an application or going to an e-commerce website, searching for an item, and adding it to your cart. In short, puppeteer can be really helpful for automating tests, especially when the scenario involves the UI of the application. I cannot talk enough about how helpful it is. See for yourself below!

How we used Puppeteer at Akto?
It started a month ago. We had to automate login workflow for a web app. Doing that, we faced a lot of problems which were becoming super hard for us to solve.
Problem statement: How do you automate login flow for a web app with minimal inputs and configurations from users?
Without delving into the specifics of what we tried, I spent days brainstorming and thinking about it and finally someone in the team recommended Puppeteer to solve this. And life was magic from this point onwards. I will tell you why we chose Puppeteer in the next section.
Why Puppeteer?
Puppeteer seemed tailor-made for the situation I described above. It's super simple to setup. I can just use a simple command to set it up! (prerequisite - node installation)
"npm i puppeteer"
Puppeteer supports headless browser instance by default. Headless here means that browser GUI is not shown, and all the tests run in the background. This is faster, as time to load UI components, loading css etc are bypassed. Headless mode can be disabled in the script via passing headless=false

How did we use Puppeteer?
There was a problem in implementation! Automating login flow and running tests happened in two separate services. So we had two approaches in our minds:
Adding puppeteer dependencies in both the services
Setting up a new service which would run the puppeteer script on any input provided.
We brainstormed and went ahead with approach #2 because of the following reasons:
Using a new service meant that now any service could use puppeteer logic without installing any new dependencies or any modifications in its dockerfile.
We want to support for arm64 architecture, and google chrome and chromium currently don’t support arm64 architecture.
Setting up the new service
We set up a new dockerized node service. Steps:
Create an empty directory and add a package.json file inside it.

Next, We added puppeteer dependency to our new module. We mentioned that in the package.json file, so that those dependencies will be installed later via our dockerfile.
2. Use the below script and add it to your package.json.

Your new service is set up now! Next we will write a server file.
Writing server file
We wrote a new server file which exposes port 3000. We created this new server file inside our module (for ex - example.js) and wrote the a script inside the file. Steps:
Write a new server file inside your module.
Write the below script inside your file.

Putting All The Pieces Together
Finally, It’s time to write our Dockerfile :) I used alpine as the base image. Steps:
Created a new Dockerfile inside the module.
Add the below script to it.

You’re done! You have now set up a service which can execute puppeteer scripts.
Alternate Way To Setup
The above steps can be a lot to take at first glance :) Instead of performing above steps, you can also run Puppeteer Service Directly In A Docker Container Using Our Image. The below steps will spawn a new docker container which runs on port 3000.

Testing It Out
Open you chrome browser and go to developer tools. Notice there is a Recorder tab, which can be used for creating recordings.
Export the recording as a Json Script
Hit the following curl command -

You can also check out our source code here - puppeteer-replay
What did I learn?
I was completely new to the world of automating browser actions, and to learn it hands-on via building such a complex feature was pretty good. Puppeteer has some really cool use cases where it can be useful for automating tests based on UI and further more. Via implementing login flow through Puppeteer, I believe we just scratched the surface and hopefully I'll get to explore more around the library in future.
Conclusion
Here you go! This is how I used Puppeteer to solve a very hard problem in a super easy way. I cannot recommend this approach highly enough. Follow the steps above and you will have developed Puppeteer as a service in no time! Feel free to reach out to me in case of any issues, would be happy to help
We are solving cutting edge problems at Akto everyday! If you want to learn more, check out our engineering blogs.
References
Keep reading
Customer case studies
8 mins read
Akto as an API Security Automation Case Study
"And then one day you find ten years have got behind you” - Pink Floyd
API security tests
6 mins read
How To Test BOLA by Parameter Pollution Using Akto
In 2016, a security researcher discovered a vulnerability that allowed attackers to bypass Uber's two-factor authentication system and take over accounts by exploiting BOLA via parameter pollution.
API security tests
8 mins read
How to test Broken Function Level Authorization by Changing the HTTP Method Using Akto?
The Equifax data breach in 2017, which exposed the personal information of 143 million individuals, was a result of a vulnerability in the Apache Struts API framework and a broken functionality level authorization (BFLA) in Equifax's web application.