机器算法验证 - 什么是用于 Bibtex 的良好通用纯文本数据格式？ - 吾爱随笔录

什么是用于 Bibtex 的良好通用纯文本数据格式？

机器算法验证项目管理

2022-03-12 09:45:21

语境

我正在写一些选择题练习题，我想以简单的纯文本数据格式存储它们。我以前使用制表符分隔，但这使得在文本编辑器中编辑有点尴尬。我想使用一种有点像 bibtex 的格式。

例如，

@Article{journals/aim/Sloman99,
  title =   "Review of Affective Computing",
  author =  "Aaron Sloman",
  journal = "AI Magazine",
  year =    "1999",
  number =  "1",
  volume =  "20",
  url = "http://dblp.uni-trier.de/db/journals/aim/aim20.html#Sloman99",
  pages =   "127--133",
}

重要的属性似乎是：

数据由记录组成
每条记录都有多个属性值对
每个属性值对可以记录在一个新行上，但可以跨越多行
易于在文本编辑器中手动输入文本数据
现成的工具可转换为表格数据

例如，这有点像可能有用的东西

@
id: 1
question: 1 + 1
a: 1
b: 2
c: 3
d: 4
correct: b

@
id: 2
question: What is the capital city of the country renowned for koalas, 
          emus, and kangaroos?
a: Canberra
b: Melbourne
c: Sydney
d: Australia
correct: a

虽然我对编写多项选择题的特定背景感兴趣，但我也对以这种或类似类型的格式表示数据的更广泛的问题感兴趣。

最初的想法

我最初的想法包括以下几点：

YAML
JSON
使用允许多行记录的自定义字段和记录分隔符分隔数据
带有某种形式的自定义解析器的自定义文件格式

我只快速浏览了 YAML 和 JSON；我的第一印象是他们可能会过度杀戮。自定义定界可能很好，但它可能要求所有字段以一致的顺序出现在所有记录中。编写我自己的解析器听起来有点繁琐。

4个回答

为什么不使用 XML？有许多很好的解析器可以直接将 XML 文件转换为数据结构，甚至有一个用于 R ( http://cran.r-project.org/web/packages/XML/index.html )。

格式如下所示（示例取自http://www.w3schools.com/xml/default.asp）。

<?xml 版本="1.0"?>
<注释>
    <注意>
        <to>祝</to>
        <from>贾尼</from>
        <heading>提醒</heading>
        <body>这个周末别忘了我！</body>
    </注>
    <注意>
        <to>贾尼斯</to>
        <from>红衣主教</from>
        <heading>提醒</heading>
        <body>下个周末别忘了我！</body>
    </注>
</注>

例如，使用 XML 包：

z=xmlTreeParse("test.xml")
z$doc$children$notes

可以访问完整的笔记正文，

z$doc$children$notes[1]

只是第一个节点等等......

我会选择 YAML。直接编辑并有大量不同语言的解析器：

---
- 
  question: 1 + 1
  incorrect:
    - 1
    - 3
    - 4
  correct: 2
-
 question: What is the capital city of the country renowned for koalas, emus, and kangaroos?
 incorrect:
   - Melbourne
   - Sydney
   - Australia
 correct: Canberra

然后，您可以编写一个小脚本将不正确的答案与正确的答案随机混合，并输出 DQdlM 的答案中建议的 LaTeX。

编辑：这个红宝石脚本：

require 'yaml'

questions = YAML.load(File.read(ARGV.first))
questions.each_with_index do |question,index|
  answers = question['incorrect'].map{|i| '    \choice ' + i.to_s }
  answers << '    \CorrectChoice ' + question['correct'].to_s

  output = ["\\question{#{index + 1}}"]
  output << question['question']
  output << "  \\begin{choices}"
  output << answers.sort_by{rand}
  output << "  \\end{choices}"
  output << "\n"

  puts output.flatten.join("\n")
end

将产生以下输出

\question{1}
1 + 1
  \begin{choices}
    \choice 4
    \choice 1
    \choice 3
    \CorrectChoice 2
  \end{choices}

\question{2}
What is the capital city of the country renowned for koalas, emus, and kangaroos?
  \begin{choices}
    \choice Melbourne
    \choice Sydney
    \CorrectChoice Canberra
    \choice Australia
  \end{choices}

这可能无法完全解决您的多项选择题之外的应用程序，但有一个适用于 LaTeX的考试课程。

多项选择题是这样形成的：

\question[2]
The fascile of a nerve is surrounded by what connective tissue layer?
  \begin{choices}
    \choice endoneurium
    \choice epineurium
    \CorrectChoice perineruium
    \choice neurolemma
    \choice none of the above
  \end{choices}

通过\printanswers在你的序言中包含它，它突出了正确的答案。

组织模式可以做到这一点。一种方法是这样的：

#+COLUMNS: %id %a %b %c %d %correct

* 1 + 1  
    :PROPERTIES:
    :id:       1
    :a:        1
    :b:        2
    :c:        3
    :d:        4
    :correct:  b
    :END:

* What is the capital city of the country renowned for koalas, emus, and kangaroos?
    :PROPERTIES:
    :id:       2
    :a:        Canberra
    :b:        Melbourne
    :c:        Sydney
    :d:        Australia
    :correct:  a
    :END:

如果您想直观地检查快速汇总表，请插入以下内容

* The column view

  #+BEGIN: columnview :hlines 1 :id global

  #+END:

将光标放在#+BEGIN块中并执行C-c C-x C-u以获得

#+BEGIN: columnview :hlines 1 :id global
| id | a        | b         | c      | d         | correct |
|----+----------+-----------+--------+-----------+---------|
|  1 | 1        | 2         | 3      | 4         | b       |
|  2 | Canberra | Melbourne | Sydney | Australia | a       |
|    |          |           |        |           |         |
#+END:

如果你想导入（例如到 R）然后插入一个像这样的表名：

#+BEGIN: columnview :hlines 1 :id global
#+tblname: simpleDF
| id | a        | b         | c      | d         | correct |
|----+----------+-----------+--------+-----------+---------|
|  1 | 1        | 2         | 3      | 4         | b       |
|  2 | Canberra | Melbourne | Sydney | Australia | a       |
#+END:

然后插入并执行以下 R 代码块C-c C-c：

#+begin_src R :session *R* :var df=simpleDF :colnames yes
head(df)
#+end_src

这给了

#+results:
| id | a        | b         | c      | d         | correct |
|----+----------+-----------+--------+-----------+---------|
|  1 | 1        | 2         | 3      | 4         | b       |
|  2 | Canberra | Melbourne | Sydney | Australia | a       |

好消息是数据框df现在存储在活动*R*会话中，可以根据需要进行后期处理。说了这么多，如果是我，我可能会从考试包（在 R 中）开始，用于存储/编写多项选择题的特定应用，尽管那个 YAML 示例看起来真的很酷。

其它你可能感兴趣的问题

上一篇后验是否必然遵循与先验相同的条件依赖结构？下一篇对数似然*总是*具有负曲率是否是这种情况？为什么？