-
Notifications
You must be signed in to change notification settings - Fork 10
Home
Welcome to the Rekrut wiki! Rekrut is a chrome extension for scraping linkedin profile pages to extract viable information for recruiters. The chrome extension makes use of nodejs library and mongodb. There are several internal dependencies as well. The server facility is provided inside the 'testing' folder of the repository. Separate it when you download the repo.
If you want to help improve the node traversal code I have provided the tree figures for reference within the documentation folder.
The above diagram's root label is actually the class name of the node. After that I have not bothered to check for id or class because I am using the DOM node attributes to traverse. I have however provided some weird attributes within labels which I will explain using the above diagram as reference.
- If a label is succeeded by a
#
or a.
. The text after these two symbols will be the node's ID or className respectively. - Do you see the two children
li
anddiv
oful
have numerical figures 1.1 and 1.2 within parenthesis? That is actually representing that the nodes that we will traverse within a ul list can be eitherli
ordiv
-
*..*
tells that there can be n number of these. It is separated by a|
from the(1.1)
If a tree node has more than one characteristic attributes I have used | to separate them -
{}
wrapped around a node label tell that in some cases this node might be missing in the document and it is paramount that you check if they are missing in the document when you traverse through the nodes -
<>
anything wrapped within these will actually tell the actual thing that is being extracted from them. These can be usually seen in the leaf nodes because we are going to stop traversing until the we reach the node to be extracted. -
[]
this is will provide a reference to a new diagram because there was not enough space within original tree diagram to show it.