How to create your own Puppeteer-as-a-service using NodeJS and Puppeteer?
Nearly a month ago we solved a very hard problem for our product, which involved automating auth token generation for a given website involving multiple login steps.
Ayush Agarwal
6 mins read
Introduction to Puppeteer JS
Nearly a month ago we solved a very hard problem for our product, which involved automating auth token generation for a given website involving multiple login steps. We did solve this using multiple ways, though tackling this via using Puppeteer was really fun and challenging. In this blog, I will be sharing how to setup a new service using NodeJs for Puppeteer and my learnings doing it.
What exactly is Puppeteer JS?
Puppeteer is a node library that can come in handy when you need to control a browser instance. It's useful for web scraping, automating workflows, and automating tests. Whenever you trigger the script run, it initialises a new browser instance (which comes with chromium by default), and it can replicate exact user workflows. This means it can do something as simple as clicking a button, or it can replicate complex flows, such as logging into an application or going to an e-commerce website, searching for an item, and adding it to your cart. In short, puppeteer can be really helpful for automating tests, especially when the scenario involves the UI of the application. I cannot talk enough about how helpful it is. See for yourself below!
How we used Puppeteer at Akto?
It started a month ago. We had to automate login workflow for a web app. Doing that, we faced a lot of problems which were becoming super hard for us to solve.
Problem statement: How do you automate login flow for a web app with minimal inputs and configurations from users?
Without delving into the specifics of what we tried, I spent days brainstorming and thinking about it and finally someone in the team recommended Puppeteer to solve this. And life was magic from this point onwards. I will tell you why we chose Puppeteer in the next section.
Why Puppeteer?
Puppeteer seemed tailor-made for the situation I described above. It's super simple to setup. I can just use a simple command to set it up! (prerequisite - node installation)
"npm i puppeteer"
Puppeteer supports headless browser instance by default. Headless here means that browser GUI is not shown, and all the tests run in the background. This is faster, as time to load UI components, loading css etc are bypassed. Headless mode can be disabled in the script via passing headless=false
How did we use Puppeteer?
There was a problem in implementation! Automating login flow and running tests happened in two separate services. So we had two approaches in our minds:
Adding puppeteer dependencies in both the services
Setting up a new service which would run the puppeteer script on any input provided.
We brainstormed and went ahead with approach #2 because of the following reasons:
Using a new service meant that now any service could use puppeteer logic without installing any new dependencies or any modifications in its dockerfile.
We want to support for arm64 architecture, and google chrome and chromium currently don’t support arm64 architecture.
Setting up the new service
We set up a new dockerized node service. Steps:
Create an empty directory and add a package.json file inside it.
Next, We added puppeteer dependency to our new module. We mentioned that in the package.json file, so that those dependencies will be installed later via our dockerfile.
2. Use the below script and add it to your package.json.
Your new service is set up now! Next we will write a server file.
Writing server file
We wrote a new server file which exposes port 3000. We created this new server file inside our module (for ex - example.js) and wrote the a script inside the file. Steps:
Write a new server file inside your module.
Write the below script inside your file.
Putting All The Pieces Together
Finally, It’s time to write our Dockerfile :) I used alpine as the base image. Steps:
Created a new Dockerfile inside the module.
Add the below script to it.
You’re done! You have now set up a service which can execute puppeteer scripts.
Alternate Way To Setup
The above steps can be a lot to take at first glance :) Instead of performing above steps, you can also run Puppeteer Service Directly In A Docker Container Using Our Image. The below steps will spawn a new docker container which runs on port 3000.
Testing It Out
Open you chrome browser and go to developer tools. Notice there is a Recorder tab, which can be used for creating recordings.
Export the recording as a Json Script
Hit the following curl command -
You can also check out our source code here - puppeteer-replay
What did I learn?
I was completely new to the world of automating browser actions, and to learn it hands-on via building such a complex feature was pretty good. Puppeteer has some really cool use cases where it can be useful for automating tests based on UI and further more. Via implementing login flow through Puppeteer, I believe we just scratched the surface and hopefully I'll get to explore more around the library in future.
Conclusion
Here you go! This is how I used Puppeteer to solve a very hard problem in a super easy way. I cannot recommend this approach highly enough. Follow the steps above and you will have developed Puppeteer as a service in no time! Feel free to reach out to me in case of any issues, would be happy to help
We are solving cutting edge problems at Akto everyday! If you want to learn more, check out our engineering blogs.
References
https://github.com/puppeteer/puppeteer
https://medium.com/swlh/an-introduction-to-web-scraping-with-puppeteer-3d35a51fdca0
Keep reading
API Security
3 minutes
What is API Discovery?
API Discovery helps identify, map, and manage APIs within an organization, ensuring security, performance, and seamless integration across systems.
API Security
5 minutes
Top 10 DAST Tools in 2024
DAST tools secure web apps by identifying vulnerabilities through automated security testing.
API Security
8 minutes
Security Information and Event Management (SIEM)
SIEM aggregates and analyzes security data across an organization to detect, monitor, and respond to potential threats in real time.
Experience enterprise-grade API Security solution