Android에서 xml 파싱을 할 경우, Android Developer는 XmlPullParser를 사용할 것을 권장하고 있다.
XmlPullParser를 객체를 생성하는 방법은 두 가지로, 두 메서드 모두 XmlPullParser를 반환한다.
XmlPullParserFactory.newInstance().newPullParser()
Xml.newPullParser()
파싱방법은 생각보다 간단한데, XmlPullParser객체에 Url에서 받은 InputStream 객체를 집어넣어 데이터를 가져온다.
자주 사용되는 메서드는 아래와 같다.
setFeature()
Returns the current value of the given feature.
setInput()
Sets the input stream the parser is going to process. This call resets the parser state and sets the event type to the initial value START_DOCUMENT.
NOTE: If an input encoding string is passed, it MUST be used. Otherwise, if inputEncoding is null, the parser SHOULD try to determine input encoding following XML 1.0 specification (see below). If encoding detection is supported then following feature http://xmlpull.org/v1/doc/features.html#detect-encoding MUST be true amd otherwise it must be false
require()
Test if the current event is of the given type and if the namespace and name do match. null will match any namespace and any name. If the test is not passed, an exception is thrown. The exception text indicates the parser position, the expected event and the current event that is not meeting the requirement.
next()
Get next parsing event - element content will be coalesced and only one TEXT event must be returned for whole element content (comments and processing instructions will be ignored and entity references must be expanded or exception must be thrown if entity reference can not be expanded). If element content is empty (content is "") then no TEXT event will be reported.
NOTE: empty element (such as <tag/>) will be reported with two separate events: START_TAG, END_TAG - it must be so to preserve parsing equivalency of empty element to <tag></tag>. (see isEmptyElementTag ())
nextTag()
Call next() and return event if it is START_TAG or END_TAG otherwise throw an exception. It will skip whitespace TEXT before actual tag if any.
getText()
Returns the text content of the current event as String. The value returned depends on current event type, for example for TEXT event it is element content (this is typical case when next() is used). See description of nextToken() for detailed description of possible returned values for different types of events.
NOTE: in case of ENTITY_REF, this method returns the entity replacement text (or null if not available). This is the only case where getText() and getTextCharacters() return different values.
getName()
For START_TAG or END_TAG events, the (local) name of the current element is returned when namespaces are enabled. When namespace processing is disabled, the raw name is returned. For ENTITY_REF events, the entity name is returned. If the current event is not START_TAG, END_TAG, or ENTITY_REF, null is returned.
Please note: To reconstruct the raw element name when namespaces are enabled and the prefix is not null, you will need to add the prefix and a colon to localName..
getAttributeValue()
Returns the attributes value identified by namespace URI and namespace localName. If namespaces are disabled namespace must be null. If current event type is not START_TAG then IndexOutOfBoundsException will be thrown.
예제 소스는 구글 - RSS (https://news.google.com/rss)를 파싱한 코드이다.
1. 데이터 분석
<rss xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<generator>NFE/5.0</generator>
<title>주요 뉴스 - Google 뉴스</title>
<link>https://news.google.com/?hl=ko&gl=KR&ceid=KR:ko</link>
<language>ko</language>
<webMaster>news-webmaster@google.com</webMaster>
<copyright>2020 Google Inc.</copyright>
<lastBuildDate>Mon, 06 Apr 2020 09:15:27 GMT</lastBuildDate>
<description>Google 뉴스</description>
<item>
<title>전 국민에 다 주자는 재난지원금…재정 뒷감당은 누가 하나 - 중앙일보 - 중앙일보 모바일</title>
<link>
https://news.google.com/__i/rss/rd/articles/CBMiJ2h0dHBzOi8vbmV3cy5qb2lucy5jb20vYXJ0aWNsZS8yMzc0ODQzNtIBK2h0dHBzOi8vbW5ld3Muam9pbnMuY29tL2FtcGFydGljbGUvMjM3NDg0MzY?oc=5
</link>
<guid isPermaLink="false">52782270364110</guid>
<pubDate>Mon, 06 Apr 2020 08:55:14 GMT</pubDate>
<description>
<ol><li><a href="https://news.google.com/__i/rss/rd/articles/CBMiJ2h0dHBzOi8vbmV3cy5qb2lucy5jb20vYXJ0aWNsZS8yMzc0ODQzNtIBK2h0dHBzOi8vbW5ld3Muam9pbnMuY29tL2FtcGFydGljbGUvMjM3NDg0MzY?oc=5" target="_blank">전 국민에 다 주자는 재난지원금…재정 뒷감당은 누가 하나 - 중앙일보</a> <font color="#6f6f6f">중앙일보 모바일</font></li><li><a href="https://news.google.com/__i/rss/rd/articles/CBMiOGh0dHA6Ly93d3cuaGFuaS5jby5rci9hcnRpL3BvbGl0aWNzL2Fzc2VtYmx5LzkzNTc3NC5odG1s0gEA?oc=5" target="_blank">민주당, 소득 관계없이 '4인 가족' 100만원 재난지원금 추진</a> <font color="#6f6f6f">한겨레</font></li><li><a href="https://news.google.com/__i/rss/rd/articles/CBMiK2h0dHBzOi8vd3d3LnlvdXR1YmUuY29tL3dhdGNoP3Y9Z0JoSnczTmdiRHfSAQA?oc=5" target="_blank">통합당의 긴급재난지원금, "그때는 틀리고 지금은 맞다"</a> <font color="#6f6f6f">노컷브이</font></li><li><a href="https://news.google.com/__i/rss/rd/articles/CBMiOmh0dHA6Ly93d3cuZG9uZ2EuY29tL25ld3MvYXJ0aWNsZS9hbGwvMjAyMDA0MDYvMTAwNTE4ODM3LzHSATZodHRwOi8vd3d3LmRvbmdhLmNvbS9uZXdzL2FtcC9hbGwvMjAyMDA0MDYvMTAwNTE4ODM3LzE?oc=5" target="_blank">전국민 재난지원금 가능할까?…“일괄지급 후 세금 환수하면 돼”</a> <font color="#6f6f6f">동아일보</font></li><li><a href="https://news.google.com/__i/rss/rd/articles/CBMiSGh0dHBzOi8vbmV3cy5jaG9zdW4uY29tL3NpdGUvZGF0YS9odG1sX2Rpci8yMDIwLzA0LzA2LzIwMjAwNDA2MDAwOTMuaHRtbNIBSmh0dHBzOi8vbS5jaG9zdW4uY29tL25ld3MvYXJ0aWNsZS5hbXAuaHRtbD9zbmFtZT1uZXdzJmNvbnRpZD0yMDIwMDQwNjAwMDkz?oc=5" target="_blank">"왜 우린 안주나" 지원금 탈락자들 부글부글</a> <font color="#6f6f6f">조선일보</font></li><li><strong><a href="https://news.google.com/stories/CAAqOQgKIjNDQklTSURvSmMzUnZjbmt0TXpZd1NoTUtFUWpPOC1hamxZQU1FUTR0ZHBjZlk2UXZLQUFQAQ?oc=5" target="_blank">Google 뉴스에서 전체 콘텐츠 보기</a></strong></li></ol>
</description>
<source url="https://news.joins.com">중앙일보 모바일</source>
</item>
<item>
...
</item>
...
</channel>
</rss>
2. 데이터 클래스 생성 (Feed, FeedMessage)
data class Feed (
val title:String,
val link:String,
val language:String,
val webMaster:String,
val copyright:String,
val lastBuildDate: Date,
val description:String,
val entries:List<FeedMessage>
)
data class FeedMessage(
val title:String,
val link:String,
val guid:Long,
val pubDate: Date,
val description:String,
val source:Source
)
3. RSS 처리 클래스 생성(xml parsing class)
object RssFeedParser {
@JvmStatic
fun readFeed(url: String) {
try {
URL(url).openStream().use { ips ->
val feedMessageList = mutableListOf<FeedMessage>()
val pullParser = Xml.newPullParser()
println(Xml.newPullParser())
pullParser?.let { parser ->
parser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, false)
parser.setInput(ips, null)
parser.nextTag()
parser.require(XmlPullParser.START_TAG, null, "rss")
while (parser.next() != XmlPullParser.END_TAG) {
if (parser.eventType != XmlPullParser.START_TAG)
continue
if (parser.name == "item") {
readFeedMessage(parser).let { data ->
//TODO Use FeedMessage data..
}
} else {
skip(parser)
}
}
}
}
} catch (e: MalformedURLException) {
e.printStackTrace()
} catch (e: IOException) {
e.printStackTrace()
} catch (e: XmlPullParserException) {
e.printStackTrace()
}
}
@Throws(IOException::class, XmlPullParserException::class)
private fun readFeedMessage(parser: XmlPullParser): FeedMessage {
val feedMessage = FeedMessage()
parser.require(XmlPullParser.START_TAG, null, "item")
while (parser.next() != XmlPullParser.END_TAG) {
if (parser.eventType != XmlPullParser.START_TAG)
continue
val name = parser.name
readTextFromParser(parser, name).let { data ->
val value = data[name]
when (name) {
"title" -> feedMessage.title = value
"link" -> feedMessage.link = value
"guid" -> feedMessage.guid = setGuid(value)
"source" -> feedMessage.source = FeedMessage.Source(value, data["url"])
else -> {
//Nothing.
}
}
}
parser.nextTag()
}
return feedMessage
}
@Throws(IOException::class, XmlPullParserException::class)
private fun readTextFromParser(
parser: XmlPullParser,
tag: String
): Map<String, String> {
val map = HashMap<String, String>()
parser.require(XmlPullParser.START_TAG, null, tag)
if (tag == "source") {
val url = parser.getAttributeValue(null, "url")
map["url"] = url
}
if (parser.next() == XmlPullParser.TEXT) {
map[tag] = parser.text
}
return map
}
@Throws(IOException::class, XmlPullParserException::class)
private fun skip(parser: XmlPullParser) {
var depth = 0
do {
when (parser.next()) {
XmlPullParser.END_TAG -> depth--
XmlPullParser.START_TAG -> depth++
}
} while (depth != 0)
if (parser.name == null)
parser.next()
}
private fun setGuid(value: String?): Long {
return value?.toLongOrNull() ?: kotlin.run {
var guid = 0L
value?.forEach {
guid += it.toLong()
}
guid
}
}
참고링크:
https://developer.android.com/training/basics/network-ops/xml?hl=ko#consume
https://developer.android.com/reference/org/xmlpull/v1/XmlPullParser?hl=ko
'안드로이드' 카테고리의 다른 글
Android - Handler 정리 (0) | 2020.04.29 |
---|---|
Android - Parcelable 정리 (0) | 2020.04.22 |
Android - 화면 당겨서 새로고치는 방법(SwipeRefreshLayout 사용 방법) (0) | 2020.04.03 |
Android Data Binding을 활용한 ViewPager와 TabLayout 연결 (0) | 2020.03.24 |
안드로이드 크롤링(Crawling)하기 (Jsoup Library 활용) (0) | 2020.03.20 |