Discussion:
[chromium-discuss] Where to find entire HTML content in Chromium source code
j***@gmail.com
2018-08-30 19:01:02 UTC
Permalink
I am currently trying to do this: once the webpage loads, find out if the
URL is of a certain pattern (say www.wikipedia.com/*), then, if so, parse
the HTML content of that webpage like one can do with BeautifulSoup, and
check if the webpage has a div with class foo and id boo. Any idea where
can I writ this code, that is, where can I get access to URL, where do I
need to listen to to know that the webpage has finished loading following
which I can look for the URL and HTML content, and where and how I can
parse the HTML?


I tried going through the code in src/chrome/browser/tab_contents, I could
not find any reasonable place where I can do all this.
--
--
Chromium Discussion mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-discuss

---
You received this message because you are subscribed to the Google Groups "Chromium-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-discuss+***@chromium.org.
PhistucK
2018-09-02 07:12:53 UTC
Permalink
Do you want the server-provided source code (the response of the server),
or the dynamic content (following DOM manipulation and so on)?
For the former, you need to look for the code that passes content provided
by the network stack (I believe it provides mostly unparsed responses to
the renderer). Note - I think there are two primary code paths for this,
because the team is working on creating a network service at the moment, as
an alternative to the current way (not sure how that is called).
For the latter, the Autofill feature does the same (it fills the form
fields with previous username/passwords automatically on page load by
traversing the DOM), so look for code paths that involve autofill.

You can use cs.chromium.org to quickly and efficiently search the code and
get to callers, definitions and so on.

☆*PhistucK*
Post by j***@gmail.com
I am currently trying to do this: once the webpage loads, find out if the
URL is of a certain pattern (say www.wikipedia.com/*), then, if so, parse
the HTML content of that webpage like one can do with BeautifulSoup, and
check if the webpage has a div with class foo and id boo. Any idea where
can I writ this code, that is, where can I get access to URL, where do I
need to listen to to know that the webpage has finished loading following
which I can look for the URL and HTML content, and where and how I can
parse the HTML?
I tried going through the code in src/chrome/browser/tab_contents, I
could not find any reasonable place where I can do all this.
--
--
http://groups.google.com/a/chromium.org/group/chromium-discuss
---
You received this message because you are subscribed to the Google Groups
"Chromium-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an
--
--
Chromium Discussion mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-discuss

---
You received this message because you are subscribed to the Google Groups "Chromium-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-discuss+***@chromium.org.
j***@gmail.com
2018-09-02 22:36:08 UTC
Permalink
Thanks PhistucK! Yes, I want the DOM manipulated one! The full HTML of the
page. I am as new to Chromium as I am new to Autofill within Chromium
(LOL), so appreciate if you can point out a place in the autofill code
which lets me hook on to the full HTML!
Post by PhistucK
Do you want the server-provided source code (the response of the server),
or the dynamic content (following DOM manipulation and so on)?
For the former, you need to look for the code that passes content provided
by the network stack (I believe it provides mostly unparsed responses to
the renderer). Note - I think there are two primary code paths for this,
because the team is working on creating a network service at the moment, as
an alternative to the current way (not sure how that is called).
For the latter, the Autofill feature does the same (it fills the form
fields with previous username/passwords automatically on page load by
traversing the DOM), so look for code paths that involve autofill.
You can use cs.chromium.org to quickly and efficiently search the code
and get to callers, definitions and so on.
☆*PhistucK*
Post by j***@gmail.com
I am currently trying to do this: once the webpage loads, find out if the
URL is of a certain pattern (say www.wikipedia.com/*), then, if so,
parse the HTML content of that webpage like one can do with BeautifulSoup,
and check if the webpage has a div with class foo and id boo. Any idea
where can I writ this code, that is, where can I get access to URL, where
do I need to listen to to know that the webpage has finished loading
following which I can look for the URL and HTML content, and where and how
I can parse the HTML?
I tried going through the code in src/chrome/browser/tab_contents, I
could not find any reasonable place where I can do all this.
--
--
http://groups.google.com/a/chromium.org/group/chromium-discuss
---
You received this message because you are subscribed to the Google Groups
"Chromium-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an
--
--
Chromium Discussion mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-discuss

---
You received this message because you are subscribed to the Google Groups "Chromium-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-discuss+***@chromium.org.
Jon Perryman
2018-09-03 00:54:09 UTC
Permalink
Maybe you could write a chrome extension that inserts a script include in
the HTML. Javascript has access to DOM which has the information you want
(including waiting for load complete). This would make your code a little
more portable to other browsers if needed and would eliminate modifying
chrome.

Regards, Jon.
Post by j***@gmail.com
Thanks PhistucK! Yes, I want the DOM manipulated one! The full HTML of the
page. I am as new to Chromium as I am new to Autofill within Chromium
(LOL), so appreciate if you can point out a place in the autofill code
which lets me hook on to the full HTML!
--
--
Chromium Discussion mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-discuss

---
You received this message because you are subscribed to the Google Groups "Chromium-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-discuss+***@chromium.org.
J Decker
2018-09-03 01:03:05 UTC
Permalink
isn't like `document.body.innerHTML` the whole document in HTML
Post by Jon Perryman
Maybe you could write a chrome extension that inserts a script include in
the HTML. Javascript has access to DOM which has the information you want
(including waiting for load complete). This would make your code a little
more portable to other browsers if needed and would eliminate modifying
chrome.
Regards, Jon.
Post by j***@gmail.com
Thanks PhistucK! Yes, I want the DOM manipulated one! The full HTML of
the page. I am as new to Chromium as I am new to Autofill within Chromium
(LOL), so appreciate if you can point out a place in the autofill code
which lets me hook on to the full HTML!
--
--
http://groups.google.com/a/chromium.org/group/chromium-discuss
---
You received this message because you are subscribed to the Google Groups
"Chromium-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an
--
--
Chromium Discussion mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-discuss

---
You received this message because you are subscribed to the Google Groups "Chromium-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-discuss+***@chromium.org.
kai zhu
2018-09-03 01:21:14 UTC
Permalink
Post by J Decker
isn't like `document.body.innerHTML` the whole document in HTML
the entire html is window.document.documentElement.outerHTML

here is a real-world example using the above, where electron saves the entire client-generated html (in addition to a screenshot)[1]

[1] source-code to capture window.document.documentElement.outerHTML in electron
https://github.com/kaizhu256/node-utility2/blob/b2576f6/lib.utility2.js#L2983 <https://github.com/kaizhu256/node-utility2/blob/b2576f6/lib.utility2.js#L2983>
kai zhu
Post by J Decker
isn't like `document.body.innerHTML` the whole document in HTML
Maybe you could write a chrome extension that inserts a script include in the HTML. Javascript has access to DOM which has the information you want (including waiting for load complete). This would make your code a little more portable to other browsers if needed and would eliminate modifying chrome.
Regards, Jon.
Thanks PhistucK! Yes, I want the DOM manipulated one! The full HTML of the page. I am as new to Chromium as I am new to Autofill within Chromium (LOL), so appreciate if you can point out a place in the autofill code which lets me hook on to the full HTML!
--
--
http://groups.google.com/a/chromium.org/group/chromium-discuss <http://groups.google.com/a/chromium.org/group/chromium-discuss>
---
You received this message because you are subscribed to the Google Groups "Chromium-discuss" group.
--
--
http://groups.google.com/a/chromium.org/group/chromium-discuss <http://groups.google.com/a/chromium.org/group/chromium-discuss>
---
You received this message because you are subscribed to the Google Groups "Chromium-discuss" group.
--
--
Chromium Discussion mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-discuss

---
You received this message because you are subscribed to the Google Groups "Chromium-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-discuss+***@chromium.org.
Jon Perryman
2018-09-03 03:09:43 UTC
Permalink
I think the OP only needs simple javascript (nothing complicated like the
example provided). "window.location" gives the URL. Searching for elements
by class ("getElementsByClassname") and ID ("getElementById") already
exist. If traversing the tree is needed, it's already there (siblings &
children). Chrome & firefox have javascript debuggers to help with finding
problems.

Regards, Jon.
Post by J Decker
isn't like `document.body.innerHTML` the whole document in HTML
the entire html is window.document.documentElement.outerHTML
here is a real-world example using the above, where electron saves the
entire client-generated html (in addition to a screenshot)[1]
[1] source-code to capture window.document.documentElement.outerHTML in electron
https://github.com/kaizhu256/node-utility2/blob/b2576f6/lib.utility2.js#L2983
--
--
Chromium Discussion mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-discuss

---
You received this message because you are subscribed to the Google Groups "Chromium-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-discuss+***@chromium.org.
j***@gmail.com
2018-09-03 10:06:12 UTC
Permalink
Thanks guys! No, I want to do it specifically within the Chromium code
(using C++). Using Javascript will require me to make an extension, right?
I do not want to that route.
Post by Jon Perryman
I think the OP only needs simple javascript (nothing complicated like the
example provided). "window.location" gives the URL. Searching for elements
by class ("getElementsByClassname") and ID ("getElementById") already
exist. If traversing the tree is needed, it's already there (siblings &
children). Chrome & firefox have javascript debuggers to help with finding
problems.
Regards, Jon.
Post by J Decker
isn't like `document.body.innerHTML` the whole document in HTML
the entire html is window.document.documentElement.outerHTML
here is a real-world example using the above, where electron saves the
entire client-generated html (in addition to a screenshot)[1]
[1] source-code to capture window.document.documentElement.outerHTML in electron
https://github.com/kaizhu256/node-utility2/blob/b2576f6/lib.utility2.js#L2983
--
--
Chromium Discussion mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-discuss

---
You received this message because you are subscribed to the Google Groups "Chromium-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-discuss+***@chromium.org.
PhistucK
2018-09-03 11:02:53 UTC
Permalink
I cannot help you much offhand (I do not work on Chromium code), but I
advise you to use cs.chromium.org to find the autofill-content interaction
point (for example, start by searching for - autofill file:/content/ - and
go on from there).

☆*PhistucK*
Post by j***@gmail.com
Thanks guys! No, I want to do it specifically within the Chromium code
(using C++). Using Javascript will require me to make an extension, right?
I do not want to that route.
Post by Jon Perryman
I think the OP only needs simple javascript (nothing complicated like the
example provided). "window.location" gives the URL. Searching for elements
by class ("getElementsByClassname") and ID ("getElementById") already
exist. If traversing the tree is needed, it's already there (siblings &
children). Chrome & firefox have javascript debuggers to help with finding
problems.
Regards, Jon.
Post by J Decker
isn't like `document.body.innerHTML` the whole document in HTML
the entire html is window.document.documentElement.outerHTML
here is a real-world example using the above, where electron saves the
entire client-generated html (in addition to a screenshot)[1]
[1] source-code to capture window.document.documentElement.outerHTML in electron
https://github.com/kaizhu256/node-utility2/blob/b2576f6/lib.utility2.js#L2983
--
--
http://groups.google.com/a/chromium.org/group/chromium-discuss
---
You received this message because you are subscribed to the Google Groups
"Chromium-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an
--
--
Chromium Discussion mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-discuss

---
You received this message because you are subscribed to the Google Groups "Chromium-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-discuss+***@chromium.org.
Jon Perryman
2018-09-03 17:34:35 UTC
Permalink
Extensions are a documented API to extend chrome that make your code less
affected by source code changes. Additionally, it helps you consider the
impact of your changes without understanding the impact on chrome.The same
thing could be made directly to the source code but you must understand
more of the internals.

I suspect that autofill can be called multiple times at different phases
(e.g. javascript modifying an elements contents). If you insert your code
here, you should verify that it fully meets your requirements.

You seem fixated on a very specific implementation rather than a design
with requirements. This makes me assume that you're code will only be
available to a very small group of people. This is fine as long as it meets
your requirements but may become a problem if you expand your target
audience.

There are many ways to solve your problem that are very acceptable. Choose
the method that meets your goals (even if that goal is to learn chrome
source code). The method I suggested comes at a cost in order to be less
intrusive.

Good luck, Jon.
Post by j***@gmail.com
Thanks guys! No, I want to do it specifically within the Chromium code
(using C++). Using Javascript will require me to make an extension, right?
I do not want to that route.
Post by Jon Perryman
I think the OP only needs simple javascript (nothing complicated like the
example provided). "window.location" gives the URL. Searching for elements
by class ("getElementsByClassname") and ID ("getElementById") already
exist. If traversing the tree is needed, it's already there (siblings &
children). Chrome & firefox have javascript debuggers to help with finding
problems.
--
--
Chromium Discussion mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-discuss

---
You received this message because you are subscribed to the Google Groups "Chromium-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-discuss+***@chromium.org.
j***@gmail.com
2018-09-03 20:44:18 UTC
Permalink
Agreed Jon! The fact that autofill can be called at multiple times had
earlier was not known to me, I thought that I might need to implement lot
of functionality to be able to do it. Now that I find that it is no so, I
will probably try it. Thanks a lot! :)
Post by Jon Perryman
Extensions are a documented API to extend chrome that make your code less
affected by source code changes. Additionally, it helps you consider the
impact of your changes without understanding the impact on chrome.The same
thing could be made directly to the source code but you must understand
more of the internals.
I suspect that autofill can be called multiple times at different phases
(e.g. javascript modifying an elements contents). If you insert your code
here, you should verify that it fully meets your requirements.
You seem fixated on a very specific implementation rather than a design
with requirements. This makes me assume that you're code will only be
available to a very small group of people. This is fine as long as it meets
your requirements but may become a problem if you expand your target
audience.
There are many ways to solve your problem that are very acceptable. Choose
the method that meets your goals (even if that goal is to learn chrome
source code). The method I suggested comes at a cost in order to be less
intrusive.
Good luck, Jon.
Post by j***@gmail.com
Thanks guys! No, I want to do it specifically within the Chromium code
(using C++). Using Javascript will require me to make an extension, right?
I do not want to that route.
Post by Jon Perryman
I think the OP only needs simple javascript (nothing complicated like
the example provided). "window.location" gives the URL. Searching for
elements by class ("getElementsByClassname") and ID ("getElementById")
already exist. If traversing the tree is needed, it's already there
(siblings & children). Chrome & firefox have javascript debuggers to help
with finding problems.
--
--
Chromium Discussion mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-discuss

---
You received this message because you are subscribed to the Google Groups "Chromium-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-discuss+***@chromium.org.
Loading...