27 January 2015

Parse Google News RSS image in C#

If you want to parse Google News RSS image from link, for example > http://news.google.com/news?hl=us&q=android&output=rss

The following code can be used to match all img src in the source text and to populate list with value of src attribute.

private static IEnumerable<string> GetImagesInGoogleNewsString(string htmlString)
        {
            List<string> imgSrcs = new List<string>();
            //const string pattern = Imgpattern;
            //var rgx = new Regex(pattern, RegexOptions.IgnoreCase);
            var imgSrcMatches = System.Text.RegularExpressions.Regex.Matches(htmlString, string.Format(@"<\s*img\s*src\s*=\s*{0}\s*([^{0}]+)\s*{0}", "\""),
               RegexOptions.CultureInvariant | RegexOptions.IgnoreCase | 
               RegexOptions.Multiline);

            foreach (Match match in imgSrcMatches)
                imgSrcs.Add("http:" + match.Groups[1].Value);

            return imgSrcs;
        }

No comments: