help with obfuscated URL's?
by jamtat from LinuxQuestions.org on (#6PHJ8)
Hope this is the right forum for this type of posting/query.
Hi. I'm a terminal/command-line type of guy who has never lost a sense of deep appreciation for text-mode browsers. Think elinks, lynx, w3m and the like. Once RSS and command-line RSS handlers like Newsbeuter/Newsboat came along I was in dreamland so far as news consumption goes. That, of course, was augmented by google news, which pulls together a wide variety of news sources in a single place. But google has become increasingly less accommodating to those who take alternative approaches such as the ones I favor. And now my news-reading scheme has broken down due, I believe, to the way google news is now obfuscating news-story URL's.
Take, for example, the following link from yesterday's national news: https://news.google.com/rss/articles...GF3Vk9nZw?oc=5 as found among the google news RSS links. If I enter that URL into a graphical browser, it can be parsed and the browser displays the linked web page at USAToday (actual link is https://www.usatoday.com/story/news/...e/74551420007/). Using my favored text-mode browsers on that same obfuscated link, however, I get a blank page with a single "home" hyperlink on it. I can get to the google news home page (https://news.google.com/home?lfhs=2&...=US&ceid=US:en) by following that link, but any subsequent link I try to follow there results in the same blank page with the single "home" hyperlink on it. So, a fruitless cycle.
The upshot is that a graphical browser can somehow resolve/deobfuscate the link, while text-mode browsers cannot. For example when I try to feed lynx the lengthy obfuscated google URL listed above, I get the blank-page-with-single-hyperlink mentioned above, but if I feed it the link to the USAToday page that obfuscated link resolves to when it is fed to Chromium browser, lynx can display the page located at https://www.usatoday.com/story/news/...e/74551420007/. I'm looking for input as to why this is and am hoping to discover how these obfuscated links can be decoded so I can continue feeding the decoded links to a text-mode browser and consume news in my preferred fashion.
I should mention that I've seen commentary on the interwebs indicating that these obfuscated links are base64 encoded. Based on those comments, I've tried using various online base64 decoders as well as a command-line one in an attempt to get a valid URL from them (from the part of the URL that follows articles/ in google's obfuscated URL format). But so far those attempts have failed to produce a valid URL of the sort I can feed to a text-mode browser. Admittedly I have a rather poor understanding of base64 and of file encoding in general, so I may be making some fairly basic mistake in my attempts thus far.
As the case may be, I'm looking for help in my attempts to continue consuming news using my favored text-mode browser solution. Perhaps I'll simply need to search for some alternate RSS agreggator that does not obfuscate links, should such a source actually exist. I've so far not found one that has the variety of news headlines/stories that google has been providing. But I plan to continue searching nonetheless. Pointers on that front will be appreciated as well.
Thanks
PS For whatever this may be worth, here's information the lynx browser displays about the page that results when I try to follow the obfuscated google link (excuse the funky line wrapping):
File that you are currently viewing
Linkname: Google News
URL: https://news.google.com/rss/articles...=US&ceid=US:en
Charset: utf-8
Server: ESF
Date: Sat, 27 Jul 2024 07:15:15 GMT
Owner(s): None
size: 0 lines
mode: normal
Link that you currently have selected
Linkname: home
URL: https://news.google.com/?lfhs=2
Server Headers:
Content-Type: text/html; charset=utf-8
Vary: Sec-Fetch-Dest, Sec-Fetch-Mode, Sec-Fetch-Site
x-ua-compatible: IE=edge
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: Mon, 01 Jan 1990 00:00:00 GMT
Date: Sat, 27 Jul 2024 07:15:15 GMT
Strict-Transport-Security: max-age=31536000
Content-Security-Policy: script-src 'nonce-5alw_wQUcy9PrP-ZzV-bwQ' 'unsafe-inline';object-src 'none';base-uri 'self';report-uri /_/DotsSplashUi/cspreport;worker-src 'self'
Content-Security-Policy: require-trusted-types-for 'script';report-uri /_/DotsSplashUi/cspreport
Permissions-Policy: ch-ua-arch=*, ch-ua-bitness=*, ch-ua-full-version=*, ch-ua-full-version-list=*, ch-ua-model=*, ch-ua-wow64=*, ch-ua-form-factors=*, ch-ua-platform=*, ch-ua-platform-version=*
Accept-CH: Sec-CH-UA-Arch, Sec-CH-UA-Bitness, Sec-CH-UA-Full-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Model, Sec-CH-UA-WoW64, Sec-CH-UA-Form-Factors, Sec-CH-UA-Platform, Sec-CH-UA-Platform-Version
reporting-endpoints: default="/_/DotsSplashUi/web-reports?context=eJzjCtDikmJw1ZBikPj6kkkNiJ3SZ7AGALFP_QzWKCD-tGMGa-vNc6yTgTjp33nWAiBeEnGR9UDiRdbJ4pdYDRUusdoDcUDuJdbm_EusQtwcS_ZO2somsOLz6Xgl5aT8wviU_JLi4oKcxOKM4tSist SieCMDIxMDcyNjPUPD-AIjAFapMsY"
Content-Encoding: gzip
Server: ESF
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
Hi. I'm a terminal/command-line type of guy who has never lost a sense of deep appreciation for text-mode browsers. Think elinks, lynx, w3m and the like. Once RSS and command-line RSS handlers like Newsbeuter/Newsboat came along I was in dreamland so far as news consumption goes. That, of course, was augmented by google news, which pulls together a wide variety of news sources in a single place. But google has become increasingly less accommodating to those who take alternative approaches such as the ones I favor. And now my news-reading scheme has broken down due, I believe, to the way google news is now obfuscating news-story URL's.
Take, for example, the following link from yesterday's national news: https://news.google.com/rss/articles...GF3Vk9nZw?oc=5 as found among the google news RSS links. If I enter that URL into a graphical browser, it can be parsed and the browser displays the linked web page at USAToday (actual link is https://www.usatoday.com/story/news/...e/74551420007/). Using my favored text-mode browsers on that same obfuscated link, however, I get a blank page with a single "home" hyperlink on it. I can get to the google news home page (https://news.google.com/home?lfhs=2&...=US&ceid=US:en) by following that link, but any subsequent link I try to follow there results in the same blank page with the single "home" hyperlink on it. So, a fruitless cycle.
The upshot is that a graphical browser can somehow resolve/deobfuscate the link, while text-mode browsers cannot. For example when I try to feed lynx the lengthy obfuscated google URL listed above, I get the blank-page-with-single-hyperlink mentioned above, but if I feed it the link to the USAToday page that obfuscated link resolves to when it is fed to Chromium browser, lynx can display the page located at https://www.usatoday.com/story/news/...e/74551420007/. I'm looking for input as to why this is and am hoping to discover how these obfuscated links can be decoded so I can continue feeding the decoded links to a text-mode browser and consume news in my preferred fashion.
I should mention that I've seen commentary on the interwebs indicating that these obfuscated links are base64 encoded. Based on those comments, I've tried using various online base64 decoders as well as a command-line one in an attempt to get a valid URL from them (from the part of the URL that follows articles/ in google's obfuscated URL format). But so far those attempts have failed to produce a valid URL of the sort I can feed to a text-mode browser. Admittedly I have a rather poor understanding of base64 and of file encoding in general, so I may be making some fairly basic mistake in my attempts thus far.
As the case may be, I'm looking for help in my attempts to continue consuming news using my favored text-mode browser solution. Perhaps I'll simply need to search for some alternate RSS agreggator that does not obfuscate links, should such a source actually exist. I've so far not found one that has the variety of news headlines/stories that google has been providing. But I plan to continue searching nonetheless. Pointers on that front will be appreciated as well.
Thanks
PS For whatever this may be worth, here's information the lynx browser displays about the page that results when I try to follow the obfuscated google link (excuse the funky line wrapping):
File that you are currently viewing
Linkname: Google News
URL: https://news.google.com/rss/articles...=US&ceid=US:en
Charset: utf-8
Server: ESF
Date: Sat, 27 Jul 2024 07:15:15 GMT
Owner(s): None
size: 0 lines
mode: normal
Link that you currently have selected
Linkname: home
URL: https://news.google.com/?lfhs=2
Server Headers:
Content-Type: text/html; charset=utf-8
Vary: Sec-Fetch-Dest, Sec-Fetch-Mode, Sec-Fetch-Site
x-ua-compatible: IE=edge
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: Mon, 01 Jan 1990 00:00:00 GMT
Date: Sat, 27 Jul 2024 07:15:15 GMT
Strict-Transport-Security: max-age=31536000
Content-Security-Policy: script-src 'nonce-5alw_wQUcy9PrP-ZzV-bwQ' 'unsafe-inline';object-src 'none';base-uri 'self';report-uri /_/DotsSplashUi/cspreport;worker-src 'self'
Content-Security-Policy: require-trusted-types-for 'script';report-uri /_/DotsSplashUi/cspreport
Permissions-Policy: ch-ua-arch=*, ch-ua-bitness=*, ch-ua-full-version=*, ch-ua-full-version-list=*, ch-ua-model=*, ch-ua-wow64=*, ch-ua-form-factors=*, ch-ua-platform=*, ch-ua-platform-version=*
Accept-CH: Sec-CH-UA-Arch, Sec-CH-UA-Bitness, Sec-CH-UA-Full-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Model, Sec-CH-UA-WoW64, Sec-CH-UA-Form-Factors, Sec-CH-UA-Platform, Sec-CH-UA-Platform-Version
reporting-endpoints: default="/_/DotsSplashUi/web-reports?context=eJzjCtDikmJw1ZBikPj6kkkNiJ3SZ7AGALFP_QzWKCD-tGMGa-vNc6yTgTjp33nWAiBeEnGR9UDiRdbJ4pdYDRUusdoDcUDuJdbm_EusQtwcS_ZO2somsOLz6Xgl5aT8wviU_JLi4oKcxOKM4tSist SieCMDIxMDcyNjPUPD-AIjAFapMsY"
Content-Encoding: gzip
Server: ESF
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000